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(54) BIOCHEMICAL INFORMATION PROCESSOR, BIOCHEMICAL INFORMATION PROCESSING 
METHOD, AND BIOCHEMICAL INFORMATION RECORDING MEDIUM 

(57) A biochemical information processing appara- 
tus comprises storage means storing biochemical infor- 
mation, input means for accepting input of data, 
reaction scheme detection means for detecting a chem- 
ical reaction scheme involving a compound, based on 
the data, and display means for indicating a reaction 
scheme diagram of the chemical reaction scheme. The 
storage means comprises a compound information file, 
an enzyme information file, and a relation information 
file, and the relation information file stores a list showing 
the relation among compound numbers of compounds 
as a key enzyme numbers of enzymes with either perti- 
nent compound being a substrate, and enzyme num- 
bers of enzymes with either pertinent compound being 
a product. The reaction scheme detection means com- 
prises a first process portion for preparing canonical 
data of the compound from the data and searching the 
compound information file based thereon to read out a 
compound number, a second process portion for read- 
ing an enzyme number of an enzyme with the com- 
pound being a substrate or a product out of the relation 
information file, based on the compound number, a third 
process portion for reading a compound number of 
another compound constituting a reaction system with 
the enzyme and additional information of the enzyme, 
and a fourth process portion for indicating a reaction 
scheme diagram of the compound on the display means 
from the compound number and the enzyme number. 
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Description 
Technical Reld 

TTie present invention relates to a processing appa- 
ratus and processing metliod for processing information 
in tlie biochemical field, and more particularly, to a 
processing apparatus and processing method that can 
search for a reaction path of a bio-related compound 
and continuously display the reaction path and that can 
obtain information concerning bio-related substances. 

Further, the present invention concerns an informa- 
tion recording medium (computer program product), 
such as a flexible disk or a magnetic tape, in which bio- 
chemical information is recorded, and more particularly, 
the invention concerns an information recording 
medium having records of information for searching for 
a reaction path of a bio-related compound, information 
for continuously displaying the reaction path, informa- 
tion concerning the bio-related substances, and so on. 

Background Art 

Compound database systems and programs stor- 
ing compound information and reaction database sys- 
tems and programs storing reaction information of 
compound have been developed heretofore. The com- 
pound database systen^ and programs store the com- 
pound information such as the physical properties and 
action of the existing compounds, and access is made 
to the compound information with the structure of a 
compound as a key The reaction database systems 
store the reaction information of the existing com- 
pounds, and access is made to the reaction information 
with the structure of a compound as a key 

An example of such a compound database is 
"MACCS" which is a compound control system available 
from MDL Inc.. Co.. the United States. Examples of the 
reaction database systems include the integrated 
chemical information control system "ISIS" and reaction 
information control system "REACCS" available from 
MDL Inc., Co., the United States. 

There are. however, no conventional com- 
pound/reaction database systems storing the relation- 
ship between compound and enzyme and the 
information concerning the bio-related substances in an 
integrated manner. Because of it. using the structure of 
a compound as a key. one was unable to efficiently 
obtain the information concerning the enzymes or the 
biochemical information related to the enzymes, sub- 
strates, and products. Also, there are no conventional 
compound/reaction database systems including a reac- 
tion path of plural compounds constructed in an inte- 
grated manner. It was, therefore, not possible to 
efficiently search for the reaction path involving a plural- 
ity of compounds. 

Further, there are no conventional compound/reac- 
tion database systems collectively storing information 



concerning rec^Dtors existing for control of bio-function 
or for transmission of information in vivo, and the infor- 
mation concerning the bio-related substances (agonists 
and antagonists). It was, therefore, not possible to effi- 

5 dentty obtain the biochemical information related to the 
rectors, agonists, and antagonists. 

An object of the present invention is to provide a 
biochemical information processing apparatus, bio- 
diemical information processing method, and informa- 

10 tion recording medium (computer program product), 
solving the above problems, which can permit one. even 
in the case of the structure of a conpound being used 
as a key, to efficiently obtain the information concerning 
the enzymes or the biochemical information related to 

IS the enzymes, substrates, and products, which can per- 
mit one to efficiently search for a reaction path involving 
a plurality of compounds, and which can permit one to 
efficiently obtain the biochemical information related to 
the receptors, agonists, and antagc»iists. 

20 

Disclosure of invention 

First explained is the biochemical information 
processing apparatus of the present invention. 
25 The biochemical information processing apparatus 
of the present invention is a biochemical information 
processing apparatus comprising 

storage means for storing biochemical information 

30 about compounds and enzymes, 

input means for accepting input of image data indi- 
cating said biochemical information or symbolic 
data indicating said biochemical information, 
reaction scheme detection means for. when said 

35 input means accepts data about a compound being 
a substrate and/or a product, detecting a chemical 
reaction scheme involving said compound, based 
on the data, and 

display means for indicating at least a reaction 
40 scheme diagram of the chemical reaction scheme; 

wherein said storage means comprises 

a compound information file storing a list show- 
ing the relation between compound numbers of 

45 the compounds and canonical data corre- 

sponding to said compounds, and additional 
information about said compounds, 
an enzyme information file storing a list show- 
ing tiie relation among enzyme numbers of tiie 

50 enzymes, compound numbers of compounds 

being substrates for said enzymes, and com- 
pound numbers of compounds being products 
by said enzymes, and additional information 
about said enzymes, and 

55 a relation (correlation) information file storing a 

list showing tiie relation among compound 
numbers of compounds as a key. enzyme num- 
bers of enzymes with either said compound 
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being a substrate, and enzyme numbers of 
enzymes with either said compound being a 
product; and 

wherein said reaction scheme detection 
means comprises 

a first process portion for preparing from 
the data about a compound accepted 
through said input means said canonical 
data uniquely indicating a chemical struc- 
ture of said compound, further searching 
said compound information file, based on 
the canonical data, and thereby reading 
out a compound number corresponding to 
said canonical data when said canonical 
data exists in said compound information 
file. 

a second process portion for reading an 
enzyme number of an enzyme with the 
compound being a substrate or a product 
out of said relation information file, based 
on the compound number read out in said 
first process portion. 

a third process portion for reading a com- 
pound number of another compound con- 
stituting a reaction system together with 
the enzyme of the enzyme number read 
out in said second process portion and 
said compound, and additional information 
about said enzyme out of said enzyme 
information file, and 

a fourth process portion for indicating a 
reaction scheme diagram of the compound 
accepted through said input means on said 
display means from the compound number 
read out in said first process portion, the 
enzyme number read out in said second 
process portion, and the corrpound 
number of the another compound read out 
in said third process portion, and further 
indicating the additional information about 
the enzyme read out in said third process 
portion on said display means. 

With tiie above biochemical information processing 
apparatus of the present invention, when the data about 
the compound accepted tiirough the input means is 
supplied to the first process portion, the canonical data 
is prepared from this data. Then the compound informa- 
tion file is searched based on tiie canonical data tiius 
prepared, and if tiie canonical data exists in the com- 
pound information file, a compound number corre- 
sponding to tiie canonical data is read out thereof. The 
compound number read out in the first process portion 
is supplied to tiie second process portion, and the sec- 
ond process portion reads an enzyme number of an 
enzyme witii tiiis compound being a substrate or a com- 
pound out of the relation information file. 



The enzyme number read out in the second proc- 
ess portion is supplied to the third process portion, and 
the tiiird process portion reads a compound number of 
another compound constituting a reaction system 

5 together with the enzyme and tiie foregoing compound, 
and add'rtional information about the enzyme out of tiie 
enzyme information file. Then the compound number 
read out in the first process portion, the enzyme number 
read out in the second process portion, and the com- 

10 pound number of the anotiier corrpound read out in tiie 
third process portion are supplied to the fourth process 
portion, and the fourth process portion lets the display 
means indicate a reaction scheme diagram of the com- 
pound accepted tiirough the input means. Similarly, tiie 

15 additional information about tiie enzyme read out in tiie 
third process portion is also indicated on tiie display 
means. 

The biochemical information processing apparatus 
of the present invention may furtiier comprise receptor 
20 information detection means for, when said input means 
accepts data about a compound, detecting additional 
information about a receptor witii said compound being 
an agonist and/or an antagonist, based on the data, and 
in this case; 

25 

said storage means further stores biochemical 
information about receptors, and 
further comprises a receptor information file storing 
a list showing the relation between receptor num- 
30 bers of the receptors and compound numbers of 
compounds being agonists and/or antagonists for 
said receptors, and additional information about 
said receptors; 

said relation information file stores a list to show tiie 
35 relation among the compound numbers of tiie com- 
pounds as a key, the enzyme numbers of tiie 
enzymes witii either said compound being a sub- 
strate, tiie enzyme numbers of the enzymes witii 
eitiier said compound being a product, the receptor 
40 numbers of the receptors witii either said com- 
pound being an agonist, and the receptor numbers 
of the receptors witii eitiier said compound being an 
antagonist; and 

said receptor information detection means com- 
45 prises 

a fifth process portion for preparing from data 
about a compound acc^ted through said input 
means said canonical data uniquely indicating 

50 a chemical structure of said compound, further 

searching said compound information file, 
based on said canonical data, and tiiereby 
reading out a compound number correspond- 
ing to said canonical data when said canonical 

55 data exists in said compound information file, 

a sixth process portion for reading, based on 
the compound number read out in said fiftii 
process portion, a receptor number of a recep- 



3 



5 



EP 0 829 810 A1 



6 



tor with the compound being an agonist or an 
antagonist out of said relation infornnation file, 
a seventh process portion for reading at least 
additional information about the receptor of the 
receptor number read out in said sixth process 5 
portion out of said receptor information file, and 
an eighth process portion for indicating at least 
the additional Information about the receptor 
read out in said seventh process portion on 
said display means. io 

In this case, in the biochemical information 
processing apparatus of the present Invention, when the 
data about the compound accepted through the input 
means is supplied to the fifth process portion, canonical is 
data is prepared from this data. Then the compound 
information file is searched based on the canonical data 
thus prepared, and if the canonical data exists in the 
compound information file, a compound number corre- 
sponding to the canonical data is read out thereof. The 20 
compound number read out in the fifth process portion 
is supplied to the sixth process portion, and the sixth 
process portion reads a receptor number of a receptor 
with this compound being an agonist or an antagonist 
out of the relation information file. The receptor number 25 
read out in the sixth process portion is supplied to the 
seventh process portion, and the seventh process por- 
tion reads at least the additional information about the 
receptor out of the receptor information file. Then at 
least the additional information about the receptor read 30 
out in the seventh process portion is supplied to the 
eighth process portion, and the eighth process portion 
lets the display means indicate at least the additional 
information about the receptor. 

Also, the biochemical information processing appa- 35 
ratus of the present invention may further comprise 
reaction path detection means for, when said input 
means accepts data about a predetermined compound 
selected from a plurality of compounds constituting a 
reaction path, detecting the reaction path of said plural- 40 
ity of compounds, based on the data, and in this case; 

said reaction path detection means comprises 

a ninth process portion for preparing from the 45 
data about the compound accepted through 
said input means said canonical data uniquely 
indicating a chemical structure of said com- 
pound, further searching said compound infor- 
mation file, based on the canonical data, and so 
thereby reading out a compound number corre- 
sponding to said canonical data when said 
canonical data exists in said compound infor- 
mation file, 

a tenth process portion for reading, based on 55 
the compound number read out in said ninth 
process portion, an enzyme number of an 
enzyme with the conrpound being a substrate 



and an enzyme number of an enzyme with the 
compound being a product out of said relation 
information file, 

an eleventh process portion for reading, based 
on each enzyme number read out in said tenth 
process portion, a conrpound number of a 
compound being a substrate for said enzyme 
and a compound number of a compound being 
a product by said enzyme out of said enzyme 
information file. 

a twelfth process portion for repeating a proc- 
ess by said tenth process portion and a proc- 
ess by said eleventh process portion to obtain 
compounds and enzymes within the predeter- 
mined reaction path, and 
a thirteenth process portion for indicating from 
enzyme numbers read out in said tenth proc- 
ess portion and compound numbers read out in 
said eleventh process portion a reaction 
scheme diagram of these compounds along 
the reaction path on said display means. 

In this case, in the biochemical information 
processing apparatus of the present invention, when the 
data about the conpound accepted through the input 
means is supplied to the ninth process portion, canoni- 
cal data is prepared from this data. Then the compound 
information file is searched based on the canonical data 
thus prepared, and if the canonical data exists in the 
compound information file, a compound number core- 
sponding to the canonical data is read out thereof. The 
compound number read out in the ninth process portion 
is supplied to the tenth process portion, and the tenth 
process portion reads an enzyme number of an enzyme 
with the compound being a SLi>strate and an enzyme 
number of an enzyme with the compound being a prod- 
uct out of the relation information file. 

Each enzyme number read out in the tenth process 
portion is supplied to the eleventh process portion, and 
the eleventh process portion reads a compound number 
of a compound being a substrate for the enzyme and a 
compound number of a compound being a product by 
the enzyme out of the enzyme information file. The 
processes of the tenth process portion and the eleventh 
process portion are repeated in the twelfth process por- 
tion. 

Then the enzyme numbers read out in the tenth 
process portion and the compound numbers read out in 
the eleventh process portion are supplied to the thir- 
teenth process portion, and the thirteenth process por- 
tion lets the display means indicate a reaction scheme 
diagram of these compounds along a predetermined 
reaction path. 

Further, the biochemical information processing 
apparatus of the present invention may be the following 
one. Namely, the apparatus is a biochemical information 
processing apparatus comprising 
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storage means for storing biochemical information 
about compounds and enzymes, 
input means for accepting input of image data indi- 
cating said biochemical information or symbolic 
data indicating said biochemical information, 5 
reaction path detection means for, when said input 
means accepts data about a predetermined com- 
pound seized from a plurality of compounds con- 
stituting a reaction path, detecting the reaction path 
of said plurality of compounds, based on the data, io 
and 

display means for indicating at least a reaction 
scheme diagram of the chemical reaction scheme; 
wherein said storage means comprises 

15 

a compound information file storing a list show- 
ing the relation between compound numbers of 
the compounds and canonical data corre- 
sponding to said compounds, and additional 
information about said compounds, 20 
an enzyme information file storing a list show- 
ing the relation among enzyme numbers of the 
enzymes, compound numbers of compounds 
being substrates for said enzymes, and com- 
pound numbers of compounds being products 25 
by said enzymes, and additional information 
about said enzymes, and 
a relation (correlation) information file storing a 
list showing the relation among compound 
numbers of compounds as a key, enzyme num- 30 
bers of aizymes with either said conpound 
being a substrate, and enzyme numbers of 
enzymes with either said compound being a 
product; and 

wherein said reaction path detection 35 
means comprises 

a ninth process portion for preparing from 
the data about the compound accepted 
through said input means said canonical 40 
data uniquely Indicating a chemical struc- 
ture of said compound, further searching 
said compound information file, based on 
the canonical data, and thereby reading 
out a compound number corresponding to 45 
said canonical data when said canonical 
data exists in said compound information 
file, 

a tenth process portion for reading, based 
on the compound number read out in said so 
ninth process portion, an enzyme number 
of an enzyme with the compound being a 
substrate and an enzyme number of an 
enzyme with the compound being a prod- 
uct out of said relation information file, 55 
an eleventh process portion for reading, 
based on each enzyme number read out in 
said tenth process portion, a compound 



number of a compound being a substrate 
for said enzyme and a compound number 
of a compound being a product by said 
enzyme out of said enzyme information 
file. 

a twelfth process portion for repeating a 
process by said tenth process portion and 
a process by said eleventh process portion 
to obtain conpounds and enzymes within 
the predetermined reaction path, and 
a thirteenth process portion for indicating 
from the enzyme numbers read out in said 
tenth process portion and compound num- 
bers read out in said eleventh process por- 
tion a reaction scheme diagram of these 
compounds along the reaction path on 
said display means. 

In this case, the biochemical information process- 
ing apparatus of the present invention may further com- 
prise receptor information detection means for. when 
said input means accepts data about a compound, 
detecting additional information about a receptor with 
said compound being an agonist and/or an antagonist, 
based on the data, and in this case; 

said storage means further stores biochemical 
information about receptors, and 
further comprises a receptor information file storing 
a list showing the relation between receptor num- 
bers of the receptors and compound numbers of 
compounds being agonists and/or antagonists for 
said rec^tors, and additional information about 
said receptors; 

said relation information file stores a list to show the 
relation among the compound numbers of the com- 
pounds as a key, the enzyme numbers of the 
enzymes with either said compound being a sub- 
strate, the enzyme nuntoers of the enzymes with 
either said compound being a product, the receptor 
numbers of the receptors with either said com- 
pound being an agonist, and the receptor numbers 
of the receptors with either said compound being an 
antagonist; and 

said receptor information detection means com- 
prises 

a fifth process portion for preparing from data 
about a compound accepted through said Input 
means said canonical data uniquely indicating 
a chemical structure of said compound, further 
searching said compound information file, 
based on said canonical data, and thereby 
reading out a compound number correspond- 
ing to said canonical data when said canonical 
data exists in said compound information file, 
a sixth process portion for reading, based on 
the compound number read out in said fifth 
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process portion, a receptor number of a recep- 
tor with the compound being an agonist or an 
antagonist out of said relation information file, 
a seventh process portion for reading at least 
additional information about the receptor of the 
receptor number read out in said sixth process 
portion out of said receptor information file, and 
an eighth process portion for indicating at least 
the additional information about the receptor 
read out in said seventh process portion on 
said display means. 

Further, in the biochemical information processing 
apparatus of the present Invention, preferably, 

said Input means accepts input of characteristic 
data about each of atoms constituting a compound 
and bonding pair data between atoms; and 
said biochemical information processing apparatus 
preferably further comprises the following canonical 
data preparation means for preparing canonical 
data capable of uniquely specifying a chemical 
structure of said compound, based on each data 
accepted through said Input means. Namely said 
canonical data preparation means comprises 

a constituent atom classification process por- 
tion for classifying, based on each data 
accepted through said input means, the atoms 
into different classes each for equivalent atoms 
and assigning, to each atom, a different class 
number for each class, 

a canonical number assignment process por- 
tion for assigning canonical numbers uniquely 
corresponding to the structure of said com- 
pound to the respective atoms, based on the 
class numbers assigned to the respective 
atoms in said constituent atom classification 
process portion, and 

a canonical data preparation process portion 
for preparing said canonical data, based on the 
canonical numbers assigned to the respective 
atoms in said canonical number assignment 
process portion. 

With the canonical data preparation means accord- 
ing to tiie present invention having the above structure, 
the characteristic data about each atom and bonding 
pair data between atoms accepted through the input 
means is supplied to the canonical data preparation 
means. Then the canonical data preparation means 
prepares tiie canonical data, based on tiiese data. 

Namely, tiie canonical data preparation means first 
carries out tiie process of constituent atom classifica- 
tion process portion to classify the atoms into different 
classes each for equivalent atoms, based on the char- 
acteristic data about each atom and the bonding pair 
data between atoms. Then class numbers of respective 



classes different from each other are assigned to tiie 
respective atoms. Next, tiie process of canonical 
number assignment process portion is caried out to 
assign canonical numbers uniquely coaesponding to 

5 tiie structure of the compound to tiie respective atoms, 
based on tiie class numbers assigned to the respective 
atoms and the bonding pair data between atoms. Fur- 
tiier, tiie process of canonical data preparation process 
portion Is can-led out to prepare the canonical data 

10 based on tiie canonical numbers assigned to tiie 
respective atoms and the characteristic data about tiie 
respective atoms. 

Here, preferably, said constituent atom classifica- 
tion process portion assigns three types of attributes (aj. 

75 by. djj) to each atom and, utilizing the fact tiiat atoms dif- 
ferent in even only one of these attributes can be deter- 
mined to be not equivalent, assigns a different dass 
number for each equivalent atom to each atom, 

where among said three types of attributes (aj, 

20 by, djj), aj is a kind number of an atom of Input number I, 
by is tiie number of bonds adjoining tiie atom of input 
number i and having a bond kind number being j, and dy 
is the number of routes that can be traced from tiie atom 
of Input number i tiirough j bonds in tiie shortest path; 

25 

said canonical number assignment process portion 
Is arranged so tiiat when in a process for assigning 
a canonical number to eadi atom in tiie ascending 
order from 1 the canonical number 1 Is given to an 

30 atom with a highest priority of said class number 
and tiiereafter canonical numbers up to tiie canoni- 
cal number n are assigned in tiiat manner, said 
canonical number assignment process portion 
selects an atom with a minimum canonical number 

35 out of atoms already having tiieir respective canon- 
ical numbers and bonding to an atom having no 
canonical number yet and tiien gives a canonical 
number n + 1 to an atom with a highest priority of 
said class number out of atoms bonding to said 

40 selected atom and having no canonical number yet; 
and 

said canonical data preparation process portion 
gives three types of atti'ibutes (Pj, Tj. Sj) to each 
atom and aligns these attributes in line to prepare 

45 said canonical data. 

where among said tiiree types of attributes 
\ Sj), Pj is a canonical number of an atom 
bonding to an atom of canonical number i and hav- 
ing a minimum canonical number. Tj Is a symbol for 

50 a type of a bond between tiie atom of canonical 
number i and the atom of canonical number Pj, and 
Sj is a symbol for a kind of tiie atom of canonical 
number i. 

55 Next explained Is tiie biochemical information 
processing method of tiie present invention. 

The biochemical information processing metiiod of 
tiie present Invention is a biochemical information 
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processing method using an infornnation processing 
apparatus connprising 

storage means lor storing biochemical information 
about compounds and enzymes, 
input means fa accepting input of image data indi- 
cating said biochemical information or symbolic 
data indicating said biochemical information, and 
display means for indicating at least a reaction 
scheme diagram of a chemical reaction scheme; 
wherein said storage means comprises 

a compound information file storing a list show- 
ing the relation between compound numbers of 
tiie compounds and canonical data corre- 
sponding to said compounds, and additional 
information about said compounds, 
an enzyme information file storing a list show- 
ing tiie relation among enzyme numbers of the 
enzymes, compound numbers of compounds 
being substrates for said enzymes, and com- 
pound numbers of conrtpounds being products 
by said enzymes, and additional information 
about said enzymes, and 
a relation (con-elation) information file storing a 
list showing the relation among compound 
numbers of compounds as a key, enzyme num- 
bers of enzymes with eitiier said compound 
being a sut}strate. and enzyme numbers of 
enzymes with either said compound being a 
product; and 

wherein said biochemical information 
processing metiiod comprises 

a first step for, when said input means 
accepts data about a conrrpound being a 
substrate and/or a product, preparing said 
canonical data uniquely indicating a chem- 
ical structure of said compound from tiie 
data, further searching said compound 
information file, based on the canonical 
data, and tiiereby reading out a compound 
number corresponding to said canonical 
data when said canonical data exists in 
said compound information file, 
a second step for reading an enzyme 
number of an enzyme with the compound 
being a substrate or a product out of said 
relation information file, based on tiie com- 
pound number read out in said first step, 
a tiiird step for reading a compound 
number of another compound constituting 
a reaction system together with tiie 
enzyme of the enzyme number read out in 
said second step and said conrpound. and 
additional information about said enzyme 
out of said enzyme information file, and 
a fourth step for indicating a reaction 



scheme diagram of the compound 
accepted through said input means on said 
display means from tiie conpound nunnber 
read out in said first step, tiie enzyme 

5 number read out in said second step, and 

the compound number of the anotiier com- 
pound read out in said third step, and fur- 
ther indicating the additional information 
about the enzyme read out in said third 

10 step on said display means. 

With the above biochemical information processing 
metiiod of the present invention, tiie processes of tiie 
first step to the fourtii step enable to detect a reaction 

75 scheme. In the detection of reaction scheme, first, tiie 
process of the first step is carried out to prepare canon- 
ical data from the data about the compound accepted 
tiirough tiie input means. Then tiie compound informa- 
tion file is searched based on the canonical data thus 

20 prepared, and if tiie canonical data exists in the com- 
pound information file, a compound nunfiber corre- 
sponding to the canonical data is read out tiiereof. Next, 
tiie process of tiie second step is carried out to read out 
an enzyme number of an enzyme with the compound 

25 being a substrate or a product out of the relation infor- 
mation file, based on the connpound number read out in 
the first step. 

Further, the process of the third step is carried out 
to read a compound numt»er of anotiier compound con- 

30 stituting a reaction system togetiier witii tiie enzyme of 
tiie enzyme number read out in the second step and tiie 
compound, and tiie additional information about tiie 
enzyme out of the enzyme information file. Then tiie 
process of the fourth step is carried out to indicate tiie 

35 reaction scheme diagram of tiie compound accepted 
through the input means on the display means from tiie 
compound number read out in tiie first step, tiie enzyme 
number read out in tiie second step, and tiie connpound 
number of tiie anotiier compound read out in tiie third 

40 Step. Similarly, the additional information about tiie 
enzyme read out in tiie third step is also indicated on the 
display means. 

In the biochemical information processing method 
of the present invention, 

45 

said storage means may further store biochemical 
information about a receptor, and 
may furtiier comprise a receptor information file 
storing a list showing tiie relation between receptor 

so numbers of tiie receptors and compound numbers 
of compounds being agonists and/or antagonists 
for said receptors, and additional information about 
said receptors, and In tiiis case; 
said relation information file stores a list to show tiie 

55 relation among the compound numbers of tiie com- 
pounds as a key. the enzyme numbers of tiie 
enzymes with either said compound being a sub- 
strate, tiie enzyme numbers of the enzymes witii 
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either said compound being a product, the receptor 
numbers of the receptors with either said com- 
pound being an agonist, and the receptor numbers 
of the receptors with either said compound being an 
antagonist; and 

said biochemical information processing method 
further comprises 

a fifth step for, when said Input means accepts 
data about a compound, preparing said canon- 
ical data uniquely indicating a chemical struc- 
ture of said conpound from the data, further 
searching said compound information file, 
based on said canonical data, and thereby 
reading out a compound number correspond- 
ing to said canonical data when said canonical 
data exists In said compound Information file, 
a sixth step for reading, based on the com- 
pound number read out in said fifth step, a 
receptor number of a receptor with the com- 
pound being an agonist or an antagonist out of 
said relation information file, 
a seventh step for reading at least additional 
information about the receptor of the receptor 
number read out In said sixth step out of said 
receptor information file, and 
an eighth step for indicating at least the addi- 
tional information about the receptor read out in 
said seventh step on said display means. 

In this case, in the biochemical information 
processing method of the present invention, the proc- 
esses of the fifth step to the eighth step enable to detect 
receptor information. In the detection of receptor infor- 
mation, first, the process of the fifth step Is carried out to 
prepare canonical data from the data about the com- 
pound accepted through the input means. Then the 
compound Information file is searched based on the 
canonical data prepared, and if the canonical data 
exists in the compound information file, a compound 
number con^esponding to the canonical data is read out 
thereof. Next, the process of the sixth step is carried out 
to read a receptor number of a receptor with the com- 
pound being an agonist or an antagonist, based on the 
compound number read out in the fifth step, out of the 
relation information file. Further, the process of the sev- 
enth step is carried out to read at least the additional 
information about the receptor of the receptor number 
read out in the sixth step out of the receptor information 
file. Then the process of the eighth step Is carried out to 
display at least the additional information about the 
receptor read out in the seventh step on the display 
means. 

The biochemical information processing method of 
the present invention may further comprise 

a ninth step for, when said input means accepts 
data about a predetermined compound selected 



from a plurality of compounds constituting a reac- 
tion path, preparing said canonical data uniquely 
indicating a chemical structure of said compound 
from the data, further searching said compound 

5 information file, based on the canonical data, and 
thereby reading out a compound number corre- 
sponding to said canonical data when said canoni- 
cal data exists in said compound Information file, 
a tenth step for reading, based on the compound 

10 number read out in said ninth step, an enzyme 
number of an enzyme with the compound being a 
substrate and an enzyme number of an enzyme 
with the conpound being a product out of said rela- 
tion information file, 

15 an eleventh step for reading, based on each 
enzyme number read out in said tenth step, a com- 
pound number of a compound being a substrate for 
said enzyme and a compound number of a com- 
pound being a product by said enzyme out of said 

20 enzyme i nf ormation file, 

a twelfth step for repeating a process by said tenth 
step and a process by said eleventh step to obtain 
compounds and enzymes within the predetermined 
reaction path, and 

25 a thirteenth step for Indicating from the enzyme 
numbers read out in said tenth step and compound 
numbers read out in said eleventh step a reaction 
scheme diagram of these compounds along the 
reaction path on said display means. 

30 

In this case, in the biochemical information 
processing method of the present Invention, the proc- 
esses of the ninth step to the twelfth step enable to 
detect a reaction path. In the detection of reaction path, 

35 first, the process of the ninth step is carried out to pre- 
pare canonical data from the data about the predeter- 
mined compound accepted through the input means. 
Then the chemical Information file is searched based on 
the canonical data thus prepared, and if the canonical 

40 data exists In the compound information file, a com- 
pound number corresponding to the canonical data Is 
read out thereof. Next, the process of the tenth step is 
carried out to read an enzyme number of an enzyme 
with this compound being a substrate and an enzyme 

45 number of an enzyme with this compound being a prod- 
uct, based on the compound number read out in the 
ninth step, out of the relation information file. Further, 
the process of the eleventh step is carried out to read, 
based on each enzyme number read out in the tenth 

50 Step, a compound number of a compound with this 
enzyme being a substrate and a compound number of a 
compound with this enzyme being a product out of the 
enzyme information file. The processes of the tenth step 
and the eleventh step are repeated in the twelfth step. 

55 Then the process of the thirteenth step is carried 
out to indicate from the enzyme numbers read out in the 
tenth step and the compound numbers read out in the 
eleventh step the reaction scheme diagram of these 
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compounds along a reaction path on the display nneans. 

Further, the biochemical infbrnnation processing 
method of the present invention nnay be the following 
one. Namely, the method may be a biochemical Infor- 
mation processing method using an information 5 
processing apparatus comprising 

storage means for storing biochemical information 
about compounds and enzymes, 
input means for accepting input of image data indi- io 
eating said biochemical information or symbolic 
data indicating said biochemical information, and 
display means for indicating at least a reaction 
scheme diagram of a chemical reaction scheme; 

wherein said storage means comprises is 

a compound information file storing a list show- 
ing the relation between conpound numbers of 
tiie compounds and canonical data corre- 
sponding to said compounds, and additional 20 
information about said compounds, 
an enzyme information file storing a list show- 
ing the relation among enzyme nunnbers of the 
enzymes, compound numbers of compounds 
being substrates for said enzymes, and com- 25 
pound nunnbers of compounds being products 
by said enzymes, and additional information 
about said enzymes, and 
a relation (correlation) information file storing a 
list showing the relation among compound 30 
numbers of compounds as a key, enzyme num- 
bers of enzymes with either said compound 
being a sut>strate, and enzyme numbers of 
enzymes with either said compound being a 
product; and 35 

wherein said biochemical information 
processing metiiod comprises 

a nintii step for, when said input means 
accepts data about a predetermined com- 40 
pound selected from a plurality of com- 
pounds constituting a reaction path, 
preparing said canonical data uniquely 
indicating a chemical structure of said 
compound from tiie data, further searching 45 
said conpound information file, based on 
tiie canonical data, and thereby reading 
out a compound number corresponding to 
said canonical data when said canonical 
data exists In said compound information so 
file, 

a tentii step for reading, based on the com- 
pound number read out in said ninth step, 
an enzyme number of an enzyme with tiie 
compound being a substrate and an ss 
enzyme number of an enzyme with the 
compound being a product out of said rela- 
tion Information file. 



an eleventii step for reading, based on 
each enzyme number read out in said 
tenth step, a compound number of a com- 
pound being a substrate for said enzyme 
and a compound number of a compound 
being a product by said enzyme out of said 
enzyme Information file, 
a twelftii step for repeating a process by 
said tenth step and a process by said elev- 
enth step to obtain compounds and 
enzymes witiiin the predetermined reac- 
tion patii. and 

a tiiirteenth step for indicating from 
enzyme numbers read out In said tentii 
step and compound numbers read out in 
said eleventh step a reaction scheme dia- 
gram of these compounds along tiie reac- 
tion patii on said display means. 

In this case, in tiie biochemical information 
processing method of the present invention, 

said storage means may further store biochemical 
information about receptors, and 
may furtiier comprise a receptor information file 
storing a list showing the relation between reenter 
numbers of the receptors and compound numbers 
of compounds being agonists and/or antagonists 
for said receptors, and additional information about 
said receptors, and In this case; 
said relation information file stores a list to show the 
relation among the compound numbers of tiie com- 
pounds as a key. the enzyme numbers of the 
enzymes witii either said compound being a sub- 
strate, tiie enzyme numbers of the enzymes witii 
either said compound being a product, the receptor 
numbers of the receptors witii either said com- 
pound being an agonist, and the receptor numbers 
of the receptors witii eitiier said compound being an 
antagonist; and 

said biochemical information processing method 
further comprises 

a fifth step for. when said input means accepts 
data about a compound, preparing said canon- 
ical data uniquely indicating a chemical struc- 
ture of said compound from the data, further 
searching said compound information file, 
based on said canonical data, and thereby 
reading out a compound number correspond- 
ing to said canonical data when said canonical 
data exists In said compound information file, 
a sixtii step for reading, based on the com- 
pound number read out in said fifth step, a 
receptor number of a receptor with tiie com- 
pound being an agonist or an antagonist out of 
said relation information file, 
a seventh step for reading at least additional 
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inlormation about the receptor of the receptor 
number read out in said sixth step out of said 
receptor information file, and 
an eighth step for indicating at least the addi- 
tional information about the receptor read out in 5 
said seventh step on said display means. 

Further, in the biochemical information processing 
method of the present invention, preferably, said input 
means accepts input of characteristic data about each io 
of atoms constituting a compound and bonding pair 
data between atoms; and 

said biochemical information processing method 
further conprises is 

a constituent atom classification step for classi- 
fying, based on each data accepted through 
said input means, the atoms into different 
classes each for equivalent atoms and assign- 20 
ing, to each atom, a different class number for 
each class, 

a canonical number assignment step for 
assigning canonical numbers uniquely corre- 
sponding to the structure of said compound to 25 
the respective atoms, based on the class num- 
bers assigned to the respective atoms in said 
constituent atom classification step, and 
a canonical data preparation step for preparing 
said canonical data enabling to uniquely spec- 30 
ify a chemical structure of said compound, 
based on the canonical numbers assigned to 
the respective atoms in said canonical number 
assignment step. 

35 

By the various steps for preparing the canonical 
data according to the present invention having such 
structure, the canonical data is prepared based on the 
characteristic data about each atom and the bonding 
pair data between atoms accepted through the input 40 
means. 

Namely, first, in the constituent atom classification 
step, the atoms are classified into different classes each 
for equivalent atoms, based on the characteristic data 
about each atom and the bonding pair data between 45 
atoms. Then a different dass number for each class is 
assigned to each atom. Next, in the canonical number 
assignment step, the canonical numbers uniquely corre- 
sponding to the structure of the compound are assigned 
to the respective atoms, based on the class numbers so 
given to the respective atoms and the bonding pair data 
between atoms. Further, in the canonical data prepara- 
tion step, the canonical data is prepared based on the 
canonical numbers given to the respective atoms and 
the characteristic data about each atom. ss 

Here, preferably, said constituent atom classifica- 
tion step assigns three types of attributes (a;, by, djj) to 
each atom and, utilizing the fact that atoms different in 



even only one of these attributes can be determined to 
be not equivalent, assigns a different class number for 
each equivalent atom to each atom, 

where among said three types of attributes (aj. 
bjj, djj), aj is a kind number of an atom of input number i. 
by is the number of bonds adjoining the atom of input 
number i and having a bond kind number being j, and dy 
is the number of routes that can be traced from the atom 
of input number i through j bonds in the shortest path; 

said canonical number assignment step is arranged 
so that when in a process for assigning a canonical 
number to each atom in the ascending order from 1 
the canonical number 1 is given to an atom with a 
highest priority of said class number and thereafter 
canonical numbers up to the canonical number n 
are assigned in that manner, said canonical number 
assignment step selects an atom with a minimum 
canonical number out of atoms already having their 
respective canonical numbers and bonding to an 
atom having no canonical number yet and then 
gives a canonical number n + 1 to an atom with a 
highest priority of said class number out of atoms 
bonding to said selected atom and having no 
canonical number yet; and 
said canonical data preparation step gives three 
types of attributes (Pj, Tj, S) to each atom and 
aligns these attributes in line to prepare said canon- 
ical data, 

where among said three types of attributes 
(Pj. Tj, Sj). Pj is a canonical number of an atom 
bonding to an atom of canonical number i and hav- 
ing a minimum canonical number, Tj is a symbol for 
a type of a bond between the atom of canonical 
number i and the atom of canonical number Pj, and 
Sj is a symbol for a kind of the atom of canonical 
number i. 

Next explained is the biochemical information com- 
puter program product (biochemical information record- 
ing medium) of the present invention. 

The biochemical information computer program 
product of tiie present invention is a biochemical infor- 
mation computer program product used witii an infor- 
mation processing apparatus comprising input means 
for accepting input of inrage data indicating biochemical 
information or symbolic data indicating biochemical 
information, display means for indicating at least a reac- 
tion scheme diagram of a chemical reaction scheme, 
and reading means for reading information out of a 
computer-usable medium; 

said computer program product comprising tiie 
computer-usable medium having a file area for 
recording a file and a program area for recording a 
program and having computer-readable file and 
program embodied in said medium, for letting at 
least a reaction scheme diagram effidentiy be 
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searched for arKi be indicated by said display 
means, based on data input through said input 
means; 

said computer program product having, 
in said file area. 5 
a computer-readable conrpound information file for 
storing a list showing the relation between com- 
pound numbers of conrpounds and canonical data 
con-esponding to said compounds, and additional 
information about said compounds, io 
a computer-readat}Ie enzyme information file for 
storing a list showing the relation among enzyme 
numbers of enzymes, compound numbers of com- 
pounds being substrates for said enzymes, and 
conpound numbers of compounds being products is 
by said enzymes, and additional information about 
said enzymes, and 

a computer-readable relation (correlation) informa- 
tion file for storing a list showing the relation among 
the compound numbers of the compounds as a key, 20 
enzyme numbers of enzymes with either said com- 
pound being a substrate, and enzyme numbers of 
enzymes with either said compound being a prod- 
uct, and 

having, in said program area. 2s 
a computer-readable reaction scheme detection 
program for, when said input means accepts data 
about a compound being a substrate and/or a prod- 
uct, detecting a chemical reaction scheme involving 
said compound, based on the data; 30 

wherein said reaction scheme detection pro- 
gram comprises 

a first computer-readable process routine for 
preparing from the data about a conpound 35 
accepted through said input means said 
canonical data uniquely indicating a chemical 
structure of said compound, further searching 
said corrpound information file, based on the 
canonical data, and thereby reading out a com- 40 
pound number corresponding to said canonical 
data when said canonical data exists in said 
compound information file, 
a second computer-readable process routine 
for reading an enzyme number of an enzyme 45 
with the compound being a substrate or a prod- 
uct out of said relation information file, based 
on the compound number read out in said first 
process routine. 

a third computer-readable process routine for so 
reading a compound number of another com- 
pourtd constituting a reaction system together 
with the enzyme of the enzyme number read 
out in said second process routine and said 
compound, and additional information about ss 
said enzyme out of said enzyme information 
file, and 

a fourth computer-readable process routine for 



indicating a reaction scheme diagram of the 
compound accepted through said input means 
on said display means from the compound 
number read out in said first process routine, 
the enzyme number read out in said second 
process routine, and the conrpound number of 
the another conpound read out in said third 
process routine, and further indicating the addi- 
tional information about the enzyme read out in 
said third process routine on said display 
means. 

In the above biochemical information computer pro- 
gram product of the present invention, the compound 
information file etc. are recorded in the file area and the 
reaction scheme detection program is recorded in the 
program area. 

The reaction scheme detection program can be 
executed using the information processing apparatus. 
By this execution, first, the process of the first process 
routine is carried out to prepare the canonical data from 
the data about the compound accepted through the 
input means. Then the conpound information file is 
searched based on the canonical data thus prepared, 
and if the canonical data exists in the conpound infor- 
mation file, a compound number corresponding to the 
canonical data is read out thereof. 

Next, the process of the second process routine is 
carried out to read an enzyme number of an enzyme 
with this compound being a substrate or a product, 
based on the compound number read out in the first 
process routine, out of the relation information file. Fur- 
ther, the process of the third process routine is carried 
out to read a compound number of another compound 
constituting a reaction system together with the enzyme 
of the enzyme number read out in the second process 
routine and the compound, and the additional informa- 
tion about the enzyme out of the enzyme information 
file. Then the process of the fourth process routine is 
carried out to indicate the reaction scheme diagram of 
the compound accepted through the input means on the 
display means from the compound number read out in 
the first process routine, the enzyme number read out in 
the second process routine, and the compound number 
of the another compound read out in the third process 
routine. Further, the additional information about the 
enzyme read out in the third process routine is also indi- 
cated on the display means. 

The biochemical information compute program 
product of the present invention may further have, in 
said file area, 

a computer-readable receptor information file stor- 
ing a list showing the relation between receptor 
numbers of the receptors and compound numbers 
of compounds being agonists and/or antagonists 
for said receptors, and additional information about 
said receptors; 
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said relation information file stores a list to show the 
relation among the compound numbers of the com- 
pounds as a key. the enzyme numbers of the 
enzymes with either said compound being a sub- 
strate, the enzyme numbers of the enzymes with 5 
either said compound being a product, the receptor 
numbers of the receptors with either said com- 
pound being an agonist, and the receptor numbers 
of the receptors with either said compound being an 
antagonist; and 

said computer program product further has, in said 
program area. 

a computer-readable receptor information detection 
program for, when said input means accepts data 
about a compound, detecting additional information 
about a receptor with said compound being an ago- 
nist and/or an antagonist, based on the data; and 
said receptor information detection program com- 
prises 

a fifth computer-readable process routine for 
preparing from data about a compound 
accepted through said input means said 
canonical data uniquely Indicating a chemical 
structure of said compound, further searching 
said compound information file, based on said 
canonical data, and thereby reading out a com- 
pound number corresponding to said canonical 
data when said canonical data exists in said 
compound information file, 
a sixth computer-readable process routine for 
reading, based on the compound number read 
out in said fifth process routine, a receptor 
number of a receptor with the compound being 
an agonist or an antagonist out of said relation 
information file. 

a seventh computer-readable process routine 
for reading at least additional information about 
the receptor of the receptor number read out in 
said sixth process routine out of said receptor 
information file, and 

an eighth computer-readable process routine 
for indicating at least the additional information 
about the receptor read out in said seventh 
process routine on said display means. 

In this case, in the above biochemical information 
computer program product of the present invention, the 
receptor information detection program is recorded in 
addition to the reaction scheme detection program in 
the program area. 

The receptor information detection program can be 
executed using the information processing apparatus. 

By this execution, first, the process of the fifth proc- 
ess routine is can'ied out to prepare the canonical data 
from the data about the compound accepted through 
the input means. Then the compound information file is 
searched based on the canonical data thus prepared. 



and if the canonical data exists in the compound infor- 
mation file, a compound number con-esponding to the 
canonical data is read out thereof. 

Next, the process of the sixth process routine is car- 
ried out to read a receptor number of a receptor wrth this 
compound being an agonist or an antagonist, based on 
the compound number read out in the fifth process rou- 
tine, out of the relation information file. Further, the 
process of the seventh process routine is carried out to 
read at least the additional information atx)ut the recep- 
tor of the receptor number read out in the sixtii process 
routine out of the receptor information file. Then tiie 
process of the eighth process routine is carried out to 
indicate at least the additional Information about tiie 
receptor read out in the seventh process routine on tiie 
display means. 

TTie biochemical information computer program 
product of tiie present invention may furtiier have, in 
said program area, 

a computer-readable reaction path detection pro- 
gram for. when said input means accepts data 
about a predetermined compound selected from a 
plurality of compounds constituting a reaction path, 
detecting the reaction patii of said plurality of com- 
pounds, based on the data, and in this case; 
said reaction path detection program comprises 

a ninth computer-readable process routine for 
preparing from tiie data about the compound 
accepted through said input means said 
canonical data uniquely Indicating a chemical 
structure of said compound, furtiier searching 
said compound information file, based on tiie 
canonical data, and thereby reading out a com- 
pound number con'esponding to said canonical 
data when said canonical data exists in said 
compound information file, 
a tentii computer-readable process routine for 
reading, t>ased on the compound number read 
out in said ninth process routine, an enzyme 
number of an enzyme with the compound 
being a substrate and an enzyme number of an 
enzyme with the conpound being a product out 
of said relation information file, 
an eleventii computer-readable process rou- 
tine for reading, based on each enzyme 
number read out in said tenth process routine, 
a compound number of a compound being a 
substrate for said enzyme and a compound 
number of a compound being a product by said 
enzyme out of said enzyme information file, 
a twelfth computer-readable process routine for 
repeating a process by said tenth process rou- 
tine and a process by said eleventh process 
routine to obtain conpounds and enzymes 
within the predetermined reaction patii, and 
a tiiirteentii computer-readable process routine 
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for indicating from enzymes numbers read out 
in said tenth process routine and conpound 
numbers read out in said eleventh process rou- 
tine a reaction scheme diagram of these com- 
pounds along the reaction path on said display 5 
means. 

In this case, in the above biochemical information 
computer program product of the present invention, the 
reaction path detection program is recorded in addition 10 
to the reaction scheme detection program and the 
receptor information detection program in the program 
area. 

The reaction path detection program can be exe- 
cuted using the information processing apparatus. is 

By this execution, first, the process of the ninth 
process routine is carried out to prepare the canonical 
data from the data about the predetermined compound 
accepted through the input means. Then the compound 
information file is searched based on the canonical data 20 
thus prepared, and if the canonical data exists in the 
compound information file, a compound number corre- 
sponding to the canonical data is read out thereof. 

Next, the process of the tenth process routine is 
can'ied out to read an enzyme number of an enzyme 25 
with the compound being a substrate and an enzyme 
number of an enzyme with the compound being a prod- 
uct, based on the compound number read out in the 
ninth process routine, out of the relation information file. 
Further, the process of the eleventh process routine is 30 
canied out to read, based on each enzyme number 
read out in the tenth process routine, a compound 
number of a compound being a substrate of the enzyme 
and a compound number of a compound being a prod- 
uct of the enzyme out of the enzyme Information file. 35 
The processes of the tenth process routine and the 
eleventh process routine are repeated in the twelfth 
process routine. 

Then the process of the thirteenth process routine 
Is carried out to indicate a reaction scheme diagram of 
these compounds along a reaction path on the display 
means from the enzyme numbers read out in the tenth 
process routine and the compound numbers read out in 
the eleventh process routine. 

Further, the biochemical information computer pro- 
gram product of the present invention may be the follow- 
ing one. Namely, the product may be a biochemical 
information computer program product used with an 
information processing apparatus comprising input 
means for accepting input of image data indicating bio- 
chemical information or symbolic data indicating bio- 
chemical information, display means for indicating at 
least a reaction scheme diagram of a chemical reaction 
scheme, and reading means for reading information out 
of a computer-usable medium; 

said computer program product comprising the 
computer-usable medium having a file area for 



recording a file and a program area for recording a 
program and having computer-readable file and 
program embodied in said medium, for letting at 
least a reaction scheme diagram efficiently be 
searched for and be indicated by said display 
means, based on data input through said input 
means; 

said computer program product having, 
in said file area. 

a computer-readable compound information file for 
storing a list showing the relation between com- 
pound numbers of compounds and canonical data 
corresponding to said compounds, and additional 
information about said compounds, 
a computer-readable enzyme information file for 
storing a list showing the relation among enzyme 
numbers of enzymes, compound numbers of com- 
pounds being substrates for said enzymes, and 
compound numbers of compounds being products 
by said enzymes, and additional information about 
said enzymes, and 

a computer-readable relation (correlation) informa- 
tion file for storing a list showing the relation among 
the compound numbers of the compounds as a key, 
enzyme numbers of enzymes with either said com- 
pound being a substrate, and enzyme numbers of 
enzymes with either said compound being a prod- 
uct, and 

having, in said program area, 
a computer-readable reaction path detection pro- 
gram for, when said input means accepts data 
about a predetermined compound selected from a 
plurality of compounds constituting a reaction path, 
detecting the reaction path of said plurality of com- 
pounds, based on the data; 

wherein said reaction path detection pro- 
gram comprises 



a ninth computer-readable process routine for 
40 preparing from the data akx)ut the compound 

accepted through said input means said 
canonical data uniquely indicating a chemical 
structure of said compound, further searching 
said compound information file, based on the 
45 canonical data, and thereby reading out a com- 

pound number con'esponding to said canonical 
data when said canonical data exists in said 
compound information file, 
a tenth computer-readable process routine for 
50 reading, based on the compound number read 

out in said ninth process routine, an enzyme 
number of an enzyme with the compound 
being a substrate and an enzyme number of an 
enzyme with the compound being a product out 
55 of said relation information file, 

an eleventh computer-readable process rou- 
tine for reading, based on each enzyme 
number read out in said tenth process routine. 
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a compound number of a compound being a 
substrate for said enzyme and a compound 
number of a compound being a product by said 
enzyme out of said enzyme information file, 
a twelftli computer-readable process routine for 
repeating a process by said tenth process rou- 
tine and a process by said eleventh process 
routine to obtain compounds and enzymes 
within the predetermined reaction path, and 
a thirteenth computer-readable process routine 
for indicating from enzyme numbers read out in 
said tenth process routine and compound num- 
bers read out in said eleventii process routine a 
reaction scheme diagram of these compounds 
along the reaction path on said display means. 

In tills case, tiie biochemical information computer 
program product of tiie present invention may further 
have, in said file area, 

a computer-readable receptor information file stor- 
ing a list showing tiie relation between a receptor 
number of a receptor and a compound number of a 
compound being an agonist and/or an antagonist of 
said receptor, and additional information about said 
receptor, and in this case; 

said relation information file stores a list to show the 
relation among a compound number of a com- 
pound as a key, an enzyme number of an enzyme 
with said compound being a substrate, an enzyme 
number of an enzyme with said conpound being a 
product, a receptor number of a receptor with said 
compound being an agonist, and a receptor 
number of a receptor with said compound being an 
antagonist; and 

said computer program product f urtiier has, in said 
program area, 

a computer-readable receptor information detection 
program for, when said input means accepts data 
about a compound, detecting additional information 
about a receptor with said compound being an ago- 
nist and/or an antagonist, based on the data; and 
said receptor information detection program com- 
prises 

a fiftti computer-readable process routine for 
preparing from data about a compound 
accepted tiirough said input means said 
canonical data uniquely indicating a chemical 
structure of said compound, searching said 
compound information file, based on this 
canonical data, and reading out a compound 
number corresponding to said canonical data 
when said canonical data exists in said com- 
pound information file, 

a sixth computer-readable process routine for 
reading, based on tiie compound number read 
out in said fifth process routine, a receptor 



number of a receptor with the compound being 
an agonist or an antagonist out of said relation 
information file, 

a seventh computer-readable process routine 
5 for reading at least additional Information about 

a receptor of tiie receptor nun^er read out in 
said sixth process routine out of said receptor 
information file, and 

an eighth computer-readable process routine 
10 for indicating at least tiie additional information 

about the receptor read out in said seventh 
process routine on said display means. 

Further, in the biochemical information computer 
15 program product of the present invention, preferably. 

said input means accepts input of characteristic 
data about each of atoms constituting a compound 
and bonding pair data between atoms; and 
20 said computer program product further has, in said 
program area. 

a computer-readable canonical data preparation 
program for preparing canonical data capable of 
uniquely specifying a chemical structure of said 
25 compound, based on each data accepted tiirough 
said input means. Namely, said canonical data 
preparation program conrprises 

a computer-readable constituent atom classif i- 
30 cation routine for classifying the atoms into dif- 

ferent classes each for equivalent atoms and 
assigning, to each atom, a different dass 
number for each class. 

a computer-readable canonical number 
35 assignment routine for assigning canonical 

numbers uniquely corresponding to tiie struc- 
ture of said compound to the respective atoms, 
based on the class numbers assigned to tiie 
respective atoms in said constituent atom clas- 
40 sification routine, and 

a computer-readable canonical data prepara- 
tion routine for preparing said canonical data, 
based on tiie canonical numbers assigned to 
the respective atoms in said canonical number 
45 assignment routine. 

By setting the biochemical information computer 
program product according to tiie present invention hav- 
ing such structure in a predetermined information 

50 processing apparatus and reading the canonical data 
preparation program stored in the program area, tiie 
canonical data preparation program can be executed by 
tiie information processing apparatus. By start of tiie 
canonical data preparation program, the constituent 

55 atom classification routine is first carried out to classify 
tiie atoms into different classes each for equivalent 
atoms, based on tiie characteristic data about each 
atom and the bonding pair data between atoms. Then a 
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different class number for each class is assign^ to 
each atom. Then the canonical number assignment rou- 
tine is candied out to assign canonical numbers uniquely 
corresponding to the structure of the compound to the 
respective atoms, based on the class numbers given to 
the respective atoms and the bonding pair data 
between atoms. Further, the canonical data preparation 
routine is carried out to prepare the canonical data 
based on the canonical numbers given to the respective 
atoms and the characteristic data about each atom. 

Here, preferably, said constituent atom classifica- 
tion routine assigns three types of attributes (aj, by, dy) 
to each atom and, utilizing the fact that atoms different 
In even only one of these attributes can be determined 
to be not equivalent, assigns a different class number 
for each equivalent atom to each atom, 

where among said three types of attributes (aj, 
bjj. djj), aj is a kind number of an atom of input number i, 
bjj is the number of bonds adjoining the atom of input 
number i and having a bond kind number being j, and dy 
is the number of routes that can be traced from the atom 
of input number i through j bonds in the shortest path; 

said canonical number assignment routine is 
arranged so that when in a process for assigning a 
canonical number to each atom in the ascending 
order from 1 the canonical number 1 is given to an 
atom with a highest priority of said class number 
and thereafter canonical numbers up to the canoni- 
cal number n are assigned in that manner, said 
canonical number assignment routine selects an 
atom with a minimum canonical number out of 
atoms already having their respective canonical 
numbers and bonding to an atom having no canon- 
ical number yet and then gives a canonical number 
n + 1 to an atom with a highest priority of said class 
number out of atoms bonding to said selected atom 
and having no canonical number yet; and 
said canonical data preparation routine gives three 
types of attributes (Pj, Tj, SO to each atom and 
aligns these attributes in line to prepare said canon- 
ical data, 

where among said three types of attributes 
(Pj, Tj. Sj). Pj is a canonical number of an atom 
bonding to an atom of canonical number i and hav- 
ing a mininujm canonical number, Tj is a symbol for 
a type of a bond between the atom of canonical 
number i and the atom of canonical number Pj, and 
Sj is a synrtbol for a kind of the atom of canonical 
number i. 

The computer-usable medium according to the 
present invention is preferably a disk type recording 
medium or a tape type recording medium. 

Brief Description of Drawings 

Fig. 1 is a block diagram to show the structure of an 



example of the biochemical information processing 
apparatus of the present invention. 

Rg. 2 is an exanple of a reaction path diagram to 
show a path in which a compound of compound number 
5 Ci changes up to a compound of compound number 

Rg. 3 is a drawing to show the structure of a com- 
pound information file. 

Rg. 4 is a drawing to show the structure of an 
10 enzyme information file. 

Rg. 5 is a drawing to show the structure of a recep- 
tor information file. 

Rg. 6 is a drawing to show the structure of an 
example of the relation information file according to the 
15 present invention. 

Rg. 7 is a drawing to show the flow of data in the 
biochemical information processing apparatus. 

Fig. 8A is a drawing to show a specific example of 
image data. Fig. 8B a specific example of bond table 
20 data, and Fig. 8C a specific example of canonical data, 
respectively 

Rg. 9A is a drawing to show a specific example of 
image data. Fig. 9B a specific example of bond table 
data, and Fig. 9C a specific example of canonical data, 
25 respectively 

Figs. 1 0A-IOC are drawings to show the relation- 
ship between image data and canonical data. 

Rg. 11 is a flowchart to show the flow of process of 
a main routine. 

30 Fig. 1 2 is a flowchart to show the flow of process of 
a three-dimensional indication routine. 

Fig. 13 is a flowchart to show the flow of process of 
a reaction scheme detection routine. 

Fig. 14 is a flowchart to show the flow of process of 
35 a reaction path detection routine. 

Rg. 1 5 is a flowchart to show the flow of process of 
the reaction path detection routine. 

Fig. 16 is a drawing to show an example of indica- 
tion on a display. 
40 Fig. 17 is a drawing to show another example of 
indication on the display. 

Rg. 18A is a drawing to show the contents of an 
atomic table in the bond table, and Fig. 18B is a drawing 
to show the contents of an atomic pair table in the bond 
45 table. 

Fig. 19 is a schematic drawing to show the sche- 
matic operation of a canonical data preparing appara- 
tus. 

Fig. 20 is a ftowchart to show the schematic proc- 
50 ess of the main routine. 

Fig. 21 is a flowchart to show the schematic proc- 
ess of the constituent atom classification routine. 

Rg. 22A is a drawing to show the contents of an 
atomic table in the bond table, and Fig. 22B is a drawing 
55 to show the contents of an atomic pair table in the bond 
table. 

Rg. 23 is a drawing to show the relationship 
between each of the atoms constituting 3, 5-dimethyl-2, 
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3, 4. 5-tetrahydropyridine and an input nunnber thereof. 

Figs. 24A and 246 are drawings each showing the 
data contents of the reference table. 

Fig. 25 is a drawing to show three types of attributes 
(aj. by, djj) given to each of the atoms constituting 3, 5- 
dimethy1-2, 3, 4, 5-tetrahydropyridine. 

Figs. 26A and 26B are drawings each showing the 
data contents of the reference table. 

Fig. 27 is a drawing to show the data contents of the 
reference table. 

Figs. 28A and 28B are drawings each showing the 
data contents of the reference table. 

Figs. 29A and 29B are drawings each showing the 
data contents of the reference table. 

Figs. 30A-30C are drawings to show the relation- 
ship between eadn of the atoms constituting 3, 5-dime- 
thyl-2. 3. 4, 5-tetrahydropyridine and a class number 
thereof. 

Fig. 31 is a drawing to show attributes Vy^ given to 
the respective atoms constituting 3, 5-dimethyl-2, 3, 4, 
5-tetrahydropyridine. 

Fig. 32 is a drawing to show attributes Vy^ given to 
the respective atoms constituting 3, 5-dimethyl-2. 3, 4, 
5-tetrahydropyridine. 

Fig, 33 is a flowchart to show the schematic proc- 
ess of a canonical number assignment routine. 

Fig. 34 is a drawing to show the relationship 
between each of the atoms each constituting 3, 5-dime- 
thyl-2, 3, 4, 5-tetrahydropyridine and a canonical 
number thereof. 

Fig. 35 is a flowchart to show the schematic proc- 
ess of a canonical data preparation routine. 

Fig. 36A is a drawing to show the contents of an 
atomic table in the bond table, and Fig. 36B is a drawing 
to show the contents of an atomic pair table in the bond 
table. 

Fig. 37 is a drawing to show the data contents of 
canonical tree structure data. 

Fig. 38A is a molecular structure diagram of Ceo 
and Fig. 38B is canonical data thereof. 

Fig. 39 is a block diagram to show the structure of 
another example of the biochemical information 
processing apparatus of the present invention. 

Fig. 40 is a block diagram to show the structure of 
an example of the canonical data preparing apparatus 
according to the present invention. 

Fig. 41 is a block diagram to show the structure of 
still another example of the biochemical information 
processing apparatus of the present invention. 

Fig. 42 is a drawing to show the structure of another 
example of the relation information file according to the 
present invention. 

Fig. 43 is a flowchart to show the flow of process of 
another example of the main routine. 

Fig. 44 is a block diagram to show the structure of 
an example of the biochemical information storage 
medium of the present invention. 

Fig. 45 is a block diagram to show the structure of 



an example of the biochemical information processing 
apparatus according to the present Invention. 

Rg. 46 is a perspective view to show an example of 
the biochemical information processing apparatus 
5 according to the present invention. 

Rg. 47 is a block diagram to show the structure of 
another example of the biochemical information storage 
medium of the present invention. 

Rg. 48 is a block diagram to show the structure of 
10 an example of a recording medium for preparation of 
canonical data according to the present invention. 

Rg. 49 is a block diagram to show the structure of 
another example of the canonical data preparing appa- 
ratus according to the present invention. 
15 Rg. 50 is a block diagram to show the structure of 
still another example of the biochemical information 
storage medium of the present invention. 

Best Mode for Carrying Out the Invention 

20 

The preferred embodiments of the present inven- 
tion will be described with reference to the accompany- 
ing drawings. Fig. 1 is a block diagram to show the 
structure of the biochemical information processing 

25 apparatus 1 according to an embodiment of the present 
invention. Referring to the drawing, the biochemical 
information processing apparatus 1 of the present 
emtKXliment comprises an image memory 10 for storing 
image data to indicate a molecular structure diagram or 

30 the like of a compound, a work memory 1 1 for temporar- 
ily storing data, a first storage device 20 for storing an 
operating system (OS) 21 and a biochemical informa- 
tion processing program 22. and a second storage 
device 30. being storage means, tor storing various 

35 files. Further, it connprises a display 40 being display 
means, an input device 50, which is input means, hav- 
ing a mouse 51 for accepting input of image data and a 
keyboard 52 for accepting input of symbolic data, a 
printer 60 for outputting the image data or the like, and 

40 a CPU 70 for controlling execution or the like of the bio- 
chemical information processing program 22. 

The biochemical information processing program 
22 comprises a main program 23 for generally control- 
ling processing, a threeKjimensional indication program 

45 24 for effecting three-dimensional indication of image 
data, a reaction scheme detection program 25 being 
reaction scheme detection means, a receptor informa- 
tion detection program 26 being receptor information 
detection means, and a reaction path detection program 

50 27 being reaction path detection means. The reaction 
scheme detection program 25 is a program for detecting 
a chemical reaction scheme concerning a compound as 
being a substrate and/or a product, which comprises 
first process routine 25a to fourth process routine 25d. 

55 The receptor information detection program 26 is a pro- 
gram for detecting additional information about a recep- 
tor, which comprises fifth process routine 26a to eighth 
process routine 26d. Further, the reaction path detec- 



16 



31 



EP0 829 810 A1 



32 



tion program 27 is a program for detecting a reaction 
path of plural compounds, which comprises ninth proc- 
ess routine 27a to thirteenth process routine 27e. 

The receptor information detection program 26 can 
handle not only receptors intrinsic to living bodies, such 
as hormone receptors, but also receptors of drugs or 
the like, and conceptual receptors existence of which is 
not confirmed yet. 

The second storage device 30 comprises a com- 
pound information file 31 . an enzyme information file 32, 
a relation (carelation) information file 33. a partial cor- 
relation data file 34. a bond table file (which will also be 
referred to as a bond table information file) 35. and a 
receptor information file 36. Among them, the com- 
pound information file 31 stores a list to show the rela- 
tionship between compound numbers of compounds 
and canonical data corresponding to the compounds, 
and additional information (for example, the reference 
data of Rg. 3) about the compounds. The enzyme infor- 
mation file 32 stores a list to show the relationship 
among enzyme numbers of enzymes, compound num- 
bers of compounds being substrates of the enzymes, 
and compound numbers of compounds being products 
by the enzymes, and additional information (for exam- 
ple, the reference data of Fig. 4) about the enzymes. 
Further, the relation information file 33 stores a list to 
show tiie relationship among compound numbers of 
compounds, enzyme numbers of enzymes witii a rele- 
vant compound being a substrate, enzyme numbers of 
enzymes with a relevant compound being a product, 
receptor numbers of receptors with a relevant com- 
pound being an agonist, and receptor numbers of 
receptors with a relevant compound being an antago- 
nist. Furthermore, the partial congelation data file 34 is 
prepared to store tiie reaction path information while tiie 
bond table file 35 to store tiie bond table data, respec- 
tively. Moreover, tiie receptor information file 36 stores a 
list to show tiie relationship among receptor numbers of 
receptors, compound numbers of compounds being 
agonists of the receptors, and compound numbers of 
compounds being antagonists of tiie receptors, and 
additional information (for example, tiie reference data 
of Fig. 5) about tiie receptors. 

Next explained is tiie detailed structure of tiie com- 
pound information file 31. enzyme information file 32, 
relation information file 33, and receptor information file 
36. Fig. 2 is an example of a reaction patii diagram to 
show a path through which a compound of compound 
number Ci changes in order to compounds of com- 
pound numbers C2. C3,... with plural enzymes of 
enzyme numbers to E5 as a catalyst, finally changing 
into a compound of compound number C7, and is also 
an example of a drawing to show circumstances in 
which compounds C6-Ci2 serve as an agonist or as an 
antagonist to receptors R1-R4. 

The compound numbers C1-C7 described in this 
example of reaction patii diagram are recorded in tiie 
compound information file 31 shown in Fig. 3. The com- 



pound information file 31 includes a record of canonical 
data corresponding to each compound of compound 
number C1-C7, and the reference data (name, literature, 
physical properties, etc.) about each compound of com- 

5 pound number -C7 in the form of a list corresponding 
to the compound numbers C-j-Cy. When access is 
made to tiie compound information file 31, using tiie 
compound number CyOj as a key, the canonical data 
and reference data can be read out as to each com- 

10 pound of compound number C1-C7. Here, the canonical 
data is a plurality of symt>olic data for uniquely specify- 
ing the chemical structure of each compound. The 
details of the canonical data will be described hereinaf- 
ter. 

15 The enzyme numbers B^Eq described in the exam- 
ple of reaction patii diagram of Fig. 2 are recorded in tiie 
enzyme information file 32 shown in Fig. 4. The enzyme 
information file 32 includes a record of the compound 
numbers CyC^ of compounds being substrates of tiie 

20 respective enzymes of enzyme numbers E^-Eg, tiie 
compound numbers C2-C7 of compounds being prod- 
ucts by the respective enzymes of enzyme numbers E^ - 
Eg, and the reference data (name, literature, physical 
properties, inhibitor, inducer, activator, etc.) about each 

25 enzyme of enzyme number EyEs in the form of a list 
corresponding to the enzyme numbers E^-Eg. 

Therefore, when access is made to the enzyme 
information file 32 using tiie enzyme number Ei -Eg as a 
key, tiie compound numbers C1-C7 being the substrate 

30 and product, and the reference data can be read out as 
to each enzyme of enzyme number E-j-Eg. It is also pos- 
sible to similarly handle reactions by enzymes not sub- 
jected to enzyme classification or to Identification of 
enzyme yet, nonenzymatic reactions involving light, 

35 heat, acid, base, metal ion, or the like, and multi-step 
reactions by a plurality of enzymes. 

Further, the receptor numbers R1-R4 are recorded 
in the receptor information file 36 shown in Fig. 5. The 
receptor information file 36 includes a record of tiie 

40 compound numbers Cg, C10-C12 of the compounds 
being agonists of tiie respective receptors of receptor 
numbers R1-R4, the compound numbers C7-C9 of tiie 
compounds being antagonists of tiie respective recep- 
tors of receptor numbers R1-R4. and the reference data 

45 (name, literature, physical properties, action, etc.) about 
each receptor of receptor number R1-R4 in the form of a 
list con^esponding to the receptor numbers R^Ra- 

Therefore, when access is made to the receptor 
information file 36, using tiie receptor number R1-R4 as 

50 a key, the compound numbers C6-C12 being the agonist 
and antagonist, and the reference data can be read out 
as to each receptor of tiie receptor number R1-R4- 

Furthernnore. tiie mutual relation among compound 
numbers CrCi2. enzyme numbers E^-Eg. and receptor 

55 numbers RrR4 is recorded in tiie relation information 
file 33 shown in Rg. 6. Describing in more detail, the 
enzyme numbers E^-Eg of enzymes with each com- 
pound of compound number C^Cg being a substrate, 
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the enzyme numbers E^-Eg of enzymes with each com- 
pound of compound number C2-C7 being a product, and 
the enzyme number E4 of the enzyme inhibited by the 
compound of compound number Cg are recorded in the 
form of a list corresponding to the compound numbers 
C1-C7. In addition, the receptor numbers Rr^n 
receptors with each compound of compound number 
Ce. C10-C12 being an agonist, and the receptor num- 
bers R2, R4 of receptors with each compound of com- 
pound number C7-C9 being an antagonist are recorded 
in the form of a list corresponding to the compound 
numbers C6-Ci2- 

Therefore, when access is made to the relation 
information file 33, using the conpound number CyCy 
as a key, It is possible to read out the enzyme numbers 
El -Eg of the enzymes with each compound of com- 
pound number CyCy being a substrate or a product, 
and the enzyme number E4 of the enzyme inhibited by 
the compound of compound number Cq. When access 
is made to the relation information file 33, using the 
compound number C6-C^2 ^ possible to read 

out the receptor numbers Ri-Ra of the receptors with 
each conpound of compound number C6-C12 being an 
agonist or an antagonist 

Next, the data contents of the enzyme information 
file 32 will be explained specifically First, from the reac- 
tion path diagram of Rg. 2, a compound number of a 
compound being a substrate for the enzyme of enzyme 
number E^ is Ci. A compound number of a compound 
being a product by the enzyme of the enzyme number 
E^ is C2. Therefore. Ci is recorded in the column of 
(substrate) compound number corresponding to the 
enzyme number Ei in the enzyme information file 32 of 
Fig. 4. In addition, C2 is recorded in the column of (prod- 
uct) compound number con'esponding to the enzyme 
number E-j. 

Similarly, from the reaction path diagram of Fig. 2, a 
compound number of a compound being a substrate for 
the enzyme of enzyme number E2 is C2. Further, a com- 
pound number of a compound being a product by the 
enzyme of enzyme number E2 is C3. Therefore, C2 is 
recorded in the column of (substrate) compound 
number corresponding to the enzyme number E2 in the 
enzyme information file 32 of Fig. 4. Also. C3 is recorded 
in the column of (product) compound number corre- 
sponding to the enzyme number E2. 

Such relation also holds for the enzyme numbers 
E3-E6 similarly, so that the compound numbers C3-C7 
along the reaction path diagram of Fig. 2 are recorded in 
each of the columns of (substrate) compound number 
and (product) compound number con-esponding to the 
enzyme numbers E3-E6. 

Next, the data contents of the receptor information 
file 36 will be described specifically. As shown in Fig. 5, 
the compound number Cg of the compound being an 
agonist for a receptor of receptor number R^ is recorded 
in the column of (agonist) compound number. Also, a 
compound number 63 of a compound being an antago- 



nist for a receptor of receptor number R2 is recorded in 
the column of (antagonist) compound number. Further, 
compound numbers C-to* C^i of compounds being ago- 
nists for a receptor of receptor number R3 are recorded 

s in the column of (agonist) compound number. Further- 
more, a compound number C12 of a compound being an 
antagonist for a receptor of receptor number R4 is 
recorded in the column of (agonist) compound number 
while compound numbers C7. Cg of compounds being 

10 antagonists for the receptor of receptor number R4 are 
recorded in the column of (antagonist) compound 
number. The relation between these receptor numbers 
and compound numbers is apparent from the reaction 
path diagram of Fig. 2. 

15 Next, the data contents of the relation information 
file 33 will be described specifically. First, from the reac- 
tion path diagram of Fig. 2, the enzyme number of the 
enzyme with the compound of compound number C-t 
being a substrate is Ei. Therefore, E-j is recorded in the 

20 column of (substrate) enzyme number corresponding to 
the compound number Ci in the relation information file 
33 of Fig. 6. 

Similarly, from the reaction path diagram of Fig. 2, 
the enzyme number of the enzyme with the conpound 

25 of compound number C2 being a substrate is E2. Also, 
the enzyme number of the enzyme with the conpound 
of compound number C2 being a product is E^ . There- 
fore, E2 is recorded in the column of (substrate) enzyme 
number corresponding to the compound number C2 in 

30 the relation information file 33 of Fig. 6. Also, E^ is 
recorded in the column of (product) enzyme number 
corresponding to the compound number 62. 

Such relation also holds for the corrpound numbers 
C3-C7 similarly, so that the enzyme numbers E2-E6 

35 along the reaction path diagram of Rg. 2 are recorded in 
each of the columns of (substrate) enzyme number and 
(product) enzyme number corresponding to the com- 
pound numbers C3-C7 (which are used as a key upon 
search using the relation information file 33). Further, 

40 the compound of compound number Ce is a substrate 
for the enzyme number Ee and a product for the enzyme 
number E5, while being an inhibitor for the enzyme 
number E4, and thus, E4 is recorded in the column of 
(inhibition) enzyme number. 

45 Furthermore, the receptor number R^ of an agonist 
for the compound of compound number C5 is recorded 
in the column of (agonism) receptor number. Also, the 
receptor number R4 of an antagonist for the compound 
of conpound number C7 is recorded in the column of 

50 (antagonism) receptor number. Following in the similar 
fashion, the receptor numbers R2-R4 of agonist/antago- 
nist for the conpounds of corrpound numbers C8-C12 
are recorded in each column of (agonism) receptor 
number/(antagonism) receptor number. 

55 Next, the flow of data in the biochemical information 
processing apparatus 1 is shown in Rg. 7. First, an 
operator draws a molecular structure diagram on the 
display 40 using the mouse 51 , and then this molecular 
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structure diagram is stored as image data 80 in the 
image memory 10. This image data 80 can be con- 
verted into either one of bond table data 81, canonical 
data 82, and three-dimensional data 83. 

Conversion between the image data 80 and the 
bond table data 81 can be made using a graphic library 
conesponding to the OS used. The conversion algo- 
rithm between the bond table data 81 and the canonical 
data 82 will be desaibed in detail hereinafter. The con- 
version algorithm between the bond table data 81 and 
the three-dimensional data 83 is described in 
"Abstracts, The 13th symposium of information science, 
p 25" by the present inventor. 

The bond table data 81 after conversion is stored in 
the bond table file 35, the canonical data 82 in the work 
memory 1 1 , and the three-dimensional data 83 in the 
image memory 10, respectively. When the operator 
gives input of symbolic data 84 indicating a name or the 
like, using the keyboard 52, a search process 84b by a 
character string is canied out to the compound informa- 
tion file 31, and compound table data 81 is made from 
canonical data of a relevant compound. This bond table 
data 81 can also be converted similarly into either of the 
image data 80 and the three-dimensional data 83. In 
contrast, when the symbolic data 84 indicating an 
enzyme name or the like is input, the search process 
84b by a character string is carried out to the enzyme 
information file 32 to read a corresponding enzyme 
number out thereof, which can be used for the subse- 
quent processes. 

Fig. 8A to Fig. 8C show a specific example of image 
data 80a, bond table data 81a. and canonical data 82a. 
Fig. 8A is the image data 80a to show the molecular 
structure of compound "4-methylpyridine". This image 
data 80a can be converted into the bond table data 81 a 
shown in Fig. 8B. The bond table data 81a is a table in 
which the number of atoms, the number of bonds, coor- 
dinates of each atom, an dement symbol of each ele- 
ment, and so on are recorded. Using this bond table 
data 81a, structures of all compounds can be expressed 
as numerical data. 

Further, the bond table data 81a can be converted 
into the canonical data 82a shown in Fig. 8C. The 
canonical data 82a Is a symbolic string including an 
array of numerals, marks, and so on. As shown in Fig. 
8C, the canonical data 82a of compound ''4-methylpyri- 
dine" is "1%1%1-2%3%5%N/6%7r. In this way, the 
canonical data 82a can express the structure of a com- 
pound in the form of a very short symbolic string. 
Because of it, if this canonical data 82a is applied, for 
example, to a compound search system, the search 
speed can be increased and the storage resource can 
be effectively utilized. 

It is, however, not easy to uniquely specify a com- 
pound with the bond table data described above, and it 
is thus not suitable to apply the bond table data to the 
compound search system. Namely, as shown in Fig. 9A 
to Fig. 9C. the image data 80b is the data expressing 



the same compound as the image data 80a, but the 
bond table data 81b is utterly different from the bond 
table data 81 a. It is seen from tiiis that a compound can- 
not be uniquely specified from the bond table data. In 
5 contrast with it, the canonical data ^b obtained by con- 
verting the bond table data 81b is the same as the 
canonical data 82a, and can uniquely specify the com- 
pound. 

In the bond table data 81a and 81b. the table with 

10 each data recorded is separated into a table of from 
atom number to mass and a table of from bonding atom 
pair to UP/DOWN. Accordingly, for example in the bond 
table data 81a, the atom number (4) and the element 
symbol (N) correspond to each other, but the atom 

15 number (4) does not correspond to the bonding atom 
pair (4 5), the type of bond (1) and UP/DOWN (0). 

Particularly as shown in Fig. 10A to Fig. IOC, two 
image data 80c. 80d are completely different from each 
other when looked at, though the both are image data 

20 indicating a same compound. The canonical data 82c 
resulting from conversion of such image data 80c, 80d 
is the same, thus proving that the canonical data can 
uniquely specify a compound. 

As described, the canonical data is more excellent 

25 than the bond table data in that it can uniquely specify a 
compound, and therefore, the canonical data is mainly 
used in each process of the biochemical information 
processing apparatus 1 of the present embodiment. 
On the other hand, since the bond table data has 

30 the coordinate data, it is useful to display a molecular 
structure diagram of compound on the display 40. Fur- 
ther, the two-dimensional coordinate data (X-coordinate 
and Y-coordinate) can be obtained by calculation from 
other data in the bond table data (though it is of course 

35 necessary to preliminarily designate the lengths of 
bonds, angles between bonds, the position of the center 
when displayed on the display and so on). 

Next, the biochemical information processing 
method according to the embodiment of the present 

40 invention will be explained. The biochemical information 
processing apparatus 1 is used for this processing 
method. First, under control of OS 21, the main program 
23 of the biochemical information processing program 
22 is started. 

45 In the main program 23, as shown in the flowchart 
of Fig. 1 1 , a selection screen of input method is first indi- 
cated on the display 40 (8100). When in accordance 
with this saeen indication the operator selects input 
through the mouse 51 (S101), a screen for drawing of 

50 molecular structure diagram is indicated on the display 
40. When the operator next inputs a molecular structure 
diagram indicating the structure of a predetermined 
compound using the mouse 51 , this graphic image is 
accepted as image data to be stored in the image mem- 

55 ory 10 (8102). This image data is also indicated on the 
display 40 (8103). Then this image data is converted 
into bond table data in accordance with the conversion 
algorithm discussed above (8104). 
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When in accordance with the screen indication of 
S100 the operator selects input through the keyboard 
52 (S101), a symbolic string input screen is indicated on 
the display 40. When the operator next gives input of a 
symbolic string of a conpound name, a chemical for- 
mula, or the tike for specifying a predetermined com- 
pound using the keyboard, this input is accepted 
(S105). search of a compound specified by this sym- 
bolic string (81 06) is can'ied out to the compound infor- 
mation file 31, and the bond table data 81 Is prepared 
from the canonical data 82 of the pertinent compound 
(S106b). Then the bond table data is converted into 
Image data, based on the aforementioned two-dimen- 
sional coordinate data (SI 07), and this Image data is 
indicated on the display 40 (SI 08). 

On the other hand, when through input by the key- 
board 52 symbolic data 84 indicating an enzyme name 
or the like Is given, search by a character string (S106) 
Is carried out to the enzyme information file 32 and a 
pertinent enzyme number is read out thereof to be used 
in similar processing. 

After completion of processing at SI 04 and at 
SI 08, a selection screen for selecting either one of the 
following processes is indicated on the display 40 
(8109). When In accordance with this screen indication 
the operator selects a save process of the bond table 
data, the bond table data is written into the bond table 
file 35 (S1 1 1). After completion of writing into the bond 
table file 35, the processing returns to 8109. When in 
accordance with the screen Indication of 81 09 the oper- 
ator selects a three<llmensional indication process, the 
threeKJimensional indication program 24 is called out 
(81 12). The three-dimensional indication program 24 is 
a processing program for three-dlmenslonally indicating 
a molecular structure diagram of compound. After com- 
pletion of the process of three<llmenslonal indication 
program 24, the processing then returns to 8109. 

Further, when In accordance with the screen indica- 
tion of 8109 the operator selects a reaction scheme 
detection process, the reaction scheme detection pro- 
gram 25 is called (8113). The reaction scheme detec- 
tion program 25 is a processing program for searching 
the relation information file 33 or the like and detecting a 
reaction scheme Involving the compound. After comple- 
tion of the process of reaction scheme detection pro- 
gram 25, the processing then returns to 8109. 
Furthermore, when in accordance with the screen indi- 
cation of 8109 the operator selects a reaction path 
detection process, the reaction path detection program 
27 Is called (8114). The reaction path detection pro- 
gram 27 is a processing program for searching the rela- 
tion information file 33 or the like and detecting a 
reaction path of plural compounds. After completion of 
the process of reaction path detection program 27 the 
processing then returns to 8109. 

Moreover, when in accordance with the saeen indi- 
cation of 8109 the operator selects a receptor informa- 
tion indication process, the receptor Information 



detection program 26 is called (8115). The receptor 
information detection program 26 is a processing pro- 
gram for searching the relation information file 33 to 
read out an agonism receptor number and/or an arttag- 

5 onism receptor number of a specific compound (the 
sixth process routine 26b): searching the receptor infor- 
mation file 36 to detect the reference data for the recep- 
tor of the receptor number thus read out (the seventh 
process routine 26c). and further indicating the refer- 

10 ence data thus detected (the eighth process routine 
26d). After completion of the process of receptor infor- 
mation detection program 26, the processing then 
returns to 8109. Furthermore, when in accordance with 
the screen indication of 8109 the operator selects a ter- 

15 mination process, the entire processing of the main pro- 
gram is terminated. 

Next explained using the flowchart of Fig. 12 is the 
process of three-dimensional indication program 24 
called at 81 12. In this process, first, the bond table data 

20 Is converted into the three-dimensional data of molecu- 
lar structure diagram in accordance with the above- 
described conversion algorithm (8120). Then an input 
promotion saeen as to whether rotation indication or 
the like of this three-dimensional data is required is Indi- 
es cated on the display 40 (8121). When start of the three- 
dimensional indication program 24 is selected on this 
screen, the three-dimensional data is converted into 
image data, using the graphic library corresponding to 
the OS used (81 24). and this image data is indicated on 

30 the display 40 (81 25). Further, when in accordance with 
this the screen indication the operator selects either one 
of a change process of conformation, a rotation proc- 
ess, an enlargement process, and a reduction process 
(8122), either of these processes is carried out by ordi- 

35 nary formation techniques of three-dimensional graph- 
ics (8123). 

Next explained using the flowchart of Fig. 13 is the 
process of reaction scheme detection program 25 
called at 81 13. In this process, first, the bond table data 

40 Is converted Into canonical data in accordance with the 
conversion algorithm as discussed hereinafter (8130). 
Then a selection screen of search object is indicated on 
the display 40 (8131). Here, in the case of the operator 
selecting a reaction scheme, it is preferable that the 

45 compound Input have preliminarily been designated as 
either a substrate or a product at previous 8102 or 
8105. Alternatively, immediately before the process of 
8130 input of designation of either a substrate or a 
product may be accepted together with the bond table 

50 data for the compound. 

Under such conditions, when in accordance with 
the screen indication of 8131 the operator selects a 
reaction scheme (8132), the following reaction scheme 
detection process Is carried out. In this process, first, 

55 access is made to the compound information file 31 to 
search for a compound (8133). This search process is 
carried out based on the canonical data of the com- 
pound converted into at 8130. When this search proc- 
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ess ends with the result that the same canonical data as 
the canonical data of the connpound does not exist in 
the compound information file 31 (S134). the process is 
terminated. If the same canonical data as the canonical 
data of the compound exists in the compound informa- 
tion file 31 , the compound number corresponding to this 
canonical data is read out of the compound information 
file 31. 

Based on the compound number (a key) read out at 
SI 33, an enzyme number (according to the aforemen- 
tioned designation) with the compound being a sub- 
strate or a product is read out of the relation information 
file 33 (SI 35). Further, based on the enzyme number 
read out at SI 35, a (substrate) compound number, a 
(product) compound number, and reference data corre- 
sponding to this enzyme number are read out of the 
enzyme information file 32 (Si 36). 

In this manner a reaction scheme diagram involving 
the compound is prepared from the compound number 
read out at SI 33 and the enzyme number read out at 
S135, and the image data of this reaction scheme dia- 
gram is indicated on the display 40. Also, the reference 
data about the enzyme read out at SI 36 is indicated on 
the display 40 (S1 37). 

The image data of reaction scheme diagram is indi- 
cated on the display 40 preferably in such an arrange- 
ment that an arrow combines a molecular structure 
diagram of the compound of the (substrate) compound 
number obtained with a molecular structure diagram of 
the compound of (product) compound number and that 
the reference data of enzyme (especially, the name) is 
placed near the arrow. Conversion from the compound 
number to the molecular structure diagram may be car- 
ried out, for example, in the order of the compound 
number, the bond table data (making access to the 
bond table file), and the molecular structure diagram 
(using the two-dimensional coordinates). 

Here, the first process routine 25a performs the 
processes of from S130 to SI 33, and these processes 
con^espond to the first step. Also, the second process 
routine 25b performs the process of S135, and this 
process corresponds to the second step. Further, the 
third process routine 25c performs the process of SI 36, 
and this process con^esponds to the third step. Yet fur- 
ther, the fourth process routine 25d performs the proc- 
ess of SI 37, and this process corresponds to the fourth 
step. 

In the present invention, the first process portion, 
step and process routine, the fifth process portion, step 
and process routine, and the ninth process portion, step 
and process routine may be the same process portion, 
step and process routine, respectively. 

Next, when in accordance with the screen indica- 
tion of S131 the operator selects a molecular structure 
diagram (SI 32), the following molecular structure dia- 
gram detection process is carried out. In this process, 
first, access is nnade to the compound information file 
31 to search for a compound of detection object (SI 38). 



The search process is carried out based on the canoni- 
cal data of the compound converted into at SI 30. If this 
search process ends with the result that the same 
canonical data as the canonical data of the detection 

5 object does not exist in the compound information file 
31 (S139), the process is terminated. If the same 
canonical data as the canonical data of the detection 
object exists in the conrpound information file 31. the 
compound number of the compound corresponding to 

10 this canonical data is read out of the compound informa- 
tion file 31. 

Based on the compound number read out at SI 38, 
the reference data etc. is read out of the compound 
information file 31 and relation information file 33 

15 (SI 40). In this manner a molecular structure diagram of 
the compound being a detection object is prepared from 
the compound number read out at S138, and the image 
data of this molecular structure diagram is indicated on 
the display 40. The reference data for this compound 

20 read out at Si 40 is also indicated on the display 40 
(S141). 

Next explained using the flowcharts of Fig. 14 and 
Fig. 15 is the process of reaction path detection pro- 
gram 27 called at S1 14. In this process, first, the bond 

25 table data of the center compound is converted into 
canonical data in accordance with the conversion algo- 
rithm discussed hereinafter, and subsequently, in order 
to determine a reaction path area to be detected, input 
of the number of predetermined reaction steps (for 

30 example, three reaction steps on the upstream side and 
five reaction steps on the downstream side with respect 
to the center compound at the center) is accepted 
(S150). 

Next, access is made to the compound information 

35 file 31 to search for the center compound, based on the 
canonical data converted into at S150 (S151). If this 
search process ends with the result that the same 
canonical data as the canonical data of the center com- 
pound does not exist in the compound information file 

40 31 (S152), the process is terminated. If the same 
canonical data as the canonical data of the center com- 
pound exists in the compound information file 31, the 
compound number corresponding to this canonical data 
is read out of the compound information file 31 . 

45 Based on the compound number (a key) read out at 
SI 51 , an enzyme number of an enzyme with this com- 
pound being a substrate and an enzyme number of an 
enzyme with this compound being a product are read 
out of the relation information file 33 (SI 53). Further, 

50 based on each enzyme number read out at SI 53, a 
compound number of a compound being a substrate for 
this enzyme and a compound number of a compound 
being a product by tiiis enzyme are read out of the 
enzyme information file 32 (SI 54). Then tiie enzyme 

55 numbers read out at SI 53 and the compound numbers 
read out at S154 are successively added into the partial 
correlation data file 34 (S155). 

The processes of from S153 to SI 55 are repeated 
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for each compound number newly read out at SI 54, and 
compound numbers of all compounds and enzyme 
numbers of all enzymes within the reaction path of the 
predetermined number of steps are written into the par- 
tial correlation data file 34 (SI 56). 

Next, when a predetermined enzyme is designated 
in the reaction path in accordance with an instruction of 
the operator (SI 57). a compound being a substrate for 
this enzyme and a compound being a product by this 
enzyme are read out of the conpound information file 
31 and the enzyme information file 32, and reaction 
scheme data is prepared from these compounds and 
enzyme {S158). Then this reaction scheme data is indi- 
cated on the display 40 (S159). Further, access is made 
to the partial correlation data file 34 to obtain all adja- 
cent reactions of this reaction scheme, and arrows indi- 
cating these adjacent reactions are indicated on the 
display 40 (S160). 

When the operator selects an indication of either 
one adjacent reaction, based on the reaction scheme 
data thus indicated on the display 40 (SI 61), the flow 
returns to the process of S157 to prepare the reaction 
scheme data for the adjacent reaction. 

Here, the ninth process routine 27a performs the 
processes of SI 50 and SI 51 , and these processes cor- 
respond to the ninth step. Also, the tenth process rou- 
tine 27b performs the process of SI 53, and this process 
con'esponds to the tenth step. Further, the eleventh 
process routine 27c performs the process of SI 54, and 
this process corresponds to the eleventh step. Further- 
more, the twelfth process routine 27d performs the 
process of S156, and this process corresponds to the 
twelfth step. Moreover, the thirteenth process routine 
27e performs the processes of from SI 57 to SI 61 , and 
these processes correspond to the thirteenth step. 

Examples of indications on the display 40 by the 
processes of S159 and S160 are shown in Fig. 16 and 
Fig. 17. From these drawings, the image data 80f, 80g 
each indicating the reaction scheme data is displayed 
on the display 40 and arrows indicating adjacent reac- 
tions are added to the both ends of the reaction scheme 
data. Selection of adjacent reaction at SI 61 is effected 
by clicking a portion of either one arrow by the mouse 
51 . In this example, when the arrow at the left end of the 
image data 80f Is clicked by the mouse 51, the image 
data 80g, which is a reaction one step before, is indi- 
cated. Any reaction scheme within the reaction path can 
be freely indicated by such switching of screen. 

Next explained are canonical data preparation 
means and method suitably applicable to the present 
invention. 

Algorithms applicable as the aforementioned con- 
version algorithm between the bond table data 81 and 
the canonical data 82 in either way include the known 
Morgan algorithm (H. L. Morgan, J. Chem. Doc, 5(2), 
1 07 (1965)) and the conversion algorithm by the present 
inventor, as described in "Abstracts, The 13th sympo- 
sium of information science, p 25." However, the con- 



ventional conversion algorithm by the present inventor 
was able to obtain the canonical data more quickiy than 
the Morgan algorithm without intervention of a process 
for classifying atoms into equivalent atoms, but because 

5 an attribute of an atom used therein was the number of 
atoms located at a specific minimum distance from the 
pertinent atom, it tacked preclseness of determination 
of equivalent atom and reliability of canonical data 
obtained was not sufficient yet. Accordingly, the present 

10 Invention particularly preferably employs the canonical 
data preparation means and method described in detail 
in the following. 

First explained is the canonical data preparation 
means suitably applicable to the present invention. The 

15 biochemical information processing apparatus 1, being 
the embodiment of the present invention shown in Fig. 
1, comprises the canonical data preparation means 
according to the present invention; that is, it comprises 
the Image memory 10 for storing the image data of 

20 molecular structure diagram, the work memory 1 1 for 
temporarily storing the symbolic data or the like, the first 
storage device 20 storing the operating system (OS) 21 
and canonical data preparation program 91, and the 
second storage device 30 storing the bond table file 35 

25 and compound information file 31 . 

The biochemical information processing apparatus 
1 comprises the display 40 for Indicating the molecular 
structure diagram, the mouse 51 being a pointing 
device for accepting input of hand<Jrawn graphic image, 

30 the keyboard 52 for accepting input of symbolic data 
such as a chemical formula, the printer 60 for outputting 
the molecular structure diagram, and the CPU 70 for 
controlling execution or the like of the canonical data 
preparation program 91. The pointing devices include a 

35 tablet, a digitizer, a light pen, and so on as well as the 
mouse 51 , and either one of these devices may replace 
the mouse 51 . 

The canonical data preparation program 91 is a 
program for preparing the canonical data based on 

40 characteristic data about each of atoms constituting a 
compound and bond pair data between atoms. This 
canonical data preparation program 91 comprises a 
main routine 91a for generally controlling the process- 
ing, and a constituent atom classification routine (con- 

45 stituent atom classification process portion) 91b for 
assigning class numbers to the respective atoms consti - 
tuting the compound. The canonical data preparation 
program 91 also comprises a canonical number assign- 
ment routine (canonical number assignment process 

50 portion) 91c for assigning canonical numbers to the 
respective atoms, based on the class numbers, and a 
canonical data preparation routine (canonical data 
preparation process portion) 91d for preparing canoni- 
cal data, based on the canonical numbers of the respec- 

55 tive atoms. 

The second storage device 30 is provided with the 
bond table file 35 capable of storing a plurality of bond 
tables 81 . A bond table 81 Includes a record of charac- 
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teristic data about each of the atoms constituting the 
compound and bond pair data between atoms, and the 
canonical data preparation program 91 can make 
access to these data through the bond table 81 . 

As shown In Fig. 18A and Fig. 18B, a bond table 81 
comprises an atomic table 81c including a record of 
characteristic data about the respective atoms, and an 
atomic pair table 81d including a record of bonding pair 
data between atoms. Specifically, the atomic table 81c 
is provided with columns for input number (also referred 
to as a number of atom), two-dimensional coordinates 
(X-coordinate and Y-coordlnate) of atom, element sym- 
bol (which is generally an element name), attribute, the 
number of atoms, and the number of bonds to be written 
wherein (see Fig. 18A), and the atomic pair table 81d is 
provided with columns for bond atom pair data, the type 
of bond (for example, 1 for single bond and 2 for double 
bond), and the structure (a column for distinction as to 
whether each atom belongs to a cyclic part or to a chain 
part of molecular structure diagram) to be written 
therein (see Fig. 18B). Here, the input numbers are 
numbers for the computer to identify the atoms consti- 
tuting the compound, and are numerals in the example 
of Fig. 18A, but may be symbols. The bonding atom pair 
data is preferably expressed as a combination of input 
numbers. 

The preparation of canonical data does not require 
the all data in the above atomic table 81c and atomic 
table 81 d. but sufficient data Includes the number and 
element symbol of each atom as characteristic data and 
the bonding atom pair data and type of bond as bonding 
pair data. 

TTie second storage device 30 stores the com- 
pound information file 31 including a record of a list to 
show the relation between a compound number of a 
compound and canonical data corresponding to the 
compound. As shown In Fig. 3, the compound informa- 
tion file 31 Is a file including a record of the canonical 
data corresponding to each connpound of compound 
number C1-O7 and the reference data (name, literature, 
physical properties, etc.) about each compound of com- 
pound C1-C7 In the form of a list corresponding to the 
compound numbers C1-C7. Therefore, if access is 
made to the compound information file 31 using the 
compound number C1-C7 as a key, the canonical data 
and reference data can be read out for each compound 
of compound number C^-Cy. Here, the canonical data is 
data comprised of a plurality of symbols for uniquely 
specifying the chemical structure of each compound. 

The constituent atom classification routine 91b cor- 
responds to the constituent atom classification step, the 
canonical number assignment routine 91c to the canon- 
ical number assignment step, and the canonical data 
preparation routine 91 d to the canonical data prepara- 
tion step, respectively. 

Next explained is the schematic operation of the 
canonical data preparation means. As shown in Fig. 19, 
the operator manipulates the nrtouse 51 or the keyboard 



52 to prepare a bond table 81 of a compound to become 
a preparation object of canonical data in the bond table 
file 35. 

Input through the mouse 51 is handwritten input of 
5 the molecular structure diagram of a conpound on the 
display 40 with the nrxDuse 51 , and an input number of 
each atom defined In the input order is written in the col- 
umn of input number in the bond table 81 prepared In 
the second storage device 30. Further, bonding atom 
10 pair data indicating the bond relation of each atom of 
this molecular structure diagram is written into the 
column of bonding atom pair in the bond table 81. As 
described, in the case of the input through the mouse 
51 , the bond table 81 for spedfying a compound Is pre- 
75 pared from the handwritten molecular structure diagram 
Ev 

Input through the keyboard 52 is input of a symbolic 
string for specifying a bond table name corresponding 
to a predetermined compound using the keytx)ard 52, 

20 and, based on input symt>ollc data 1 1 a, a bond table 81 
specified by this bond table name is read out of the 
bond table file 35. 

As described, the mouse 51 and keyboard 52 com- 
pose input means A (50). and a bond table 81 is 

25 obtained using either one of the mouse 51 and key- 
board 52. Then the canonical data preparation program 
91, being canonical preparation means B, is carried out 
to prepare the canonical data 82, based on each data In 
the bond table 81 . The canonical data 82 thus prepared 

30 is written into the compound information file 31 to be 
saved therein. Here, a reason why the canonical data 
82 is prepared from the bond table 81 to be saved is that 
a storage area thereof is smaller than that when the 
bond table 81 itself is saved and a compound can be 

35 uniquely specified. Namely, the canonical data 82 pre- 
pared based on the bond table 81 shown in Figs. 18A 
and 188 Is "1%1%1-2%3%5%N/6%7r. and can 
express the structure of the compound by a very short 
string of character, numeral, and symbol and uniquely. 

40 By employing such a short symbolic string as an object 
of save, the storage resource can be effectively utilized, 
which can contribute to size and weight reductions of 
apparatus. 

The two-dimensional coordinate calculation proc- 
45 ess is carried out based on each data in the bond table 
81, thereby obtaining two-dimensional coordinate data 
of each atom. A molecular structure diagram E2, excel- 
lent in an aesthetic sense, is prepared from the two- 
dimensional coordinate data thus obtained. The moiec- 
50 ular structure diagram E2 thus prepared can be indi- 
cated on the display 40 or can be output from the printer 
60. 

The input through the keyboard 52 may be 
arranged to directly write the aforementioned data or 
55 the like Indicating bonding states of atoms into the bond 
table 81 prepared In the second storage device 30. 
Input of bond table data may be accepted using a 
device for optically reading graphics or characters, such 
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as an image scanner or an optical card reader (OCR), 
as the input device of tlie present invention. 

Next explained is the canonical data preparation 
method being the embodiment according to the present 
invention. The canonical data preparation means 
described above is used for this preparation method. 
First, the main routine 91a of the canonical data prepa- 
ration program 91 is started under control of OS 21 . 

As shown in the flowchart of Fig. 20, the main rou- 
tine 91a first calls the constituent atom classification 
routine 91b to assign a class number to each of atoms 
forming a compound (S910). Next, the canonical 
number assignment routine 91c is called to assign a 
canonical number to each atom, based on the class 
numbers assigned to the respective atoms (S920). Fur- 
ther, the canonical data preparation routine 91 d is called 
to prepare canonical data, based on the canonical num- 
bers assigned to the respective atoms (S930). The 
canonical data thus prepared is written into the com- 
pound information file 31 to be saved therein. 

Next explained is the process of constituent atom 
classification routine 91b called at S91 0. This process is 
a process for classifying each of the atoms constituting 
the compound into different classes each for equivalent 
atoms and giving each atom a class number corre- 
sponding to a class to which the each atom belongs. For 
example, since all atoms of benzene are equivalent, a 
same class number is given to the all. In contrast, since 
each atom of toluene is not equivalent to each other, dif- 
ferent class numbers are given to the respective atoms. 

As shown in the flowchart of Fig. 21, first, three 
types of attributes (aj, bjj, dy) are given to each of the 
atoms constituting the compound, based on the bond 
table 81 (S91 1). Here, attribute a; is a kind number of an 
atom of input number i (which is an atomic number in 
this example). Also, attribute bjj is the number (vector 
quantity) of bonds that are bonds adjacent to an atom of 
input number i and bonds with a kind number thereof 
(which is a type of bond in this example (1 for single 
bond, 2 for double bond. 3 for triple bond, 4 for aromatic 
bond,...)) being j. Further, attribute dy is the number 
(vector quantity) of routes that can be traced from an 
atom of input number i via j bonds in the shortest path. 

Next, the attributes (aj. by, dy) are arranged for each 
atom to otrtain a 9-digit numeral string, class numbers 
0° are given to the atoms in the ascending order of the 
numeral strings from the smallest, and then the atoms 
are classified into a plurality of classes (S912). The 
class numbers Cj^ given herein are zeroth-degree class 
numbers, and first-degree class numbers Cj"", second- 
degree class numbers Cj^,... are successively obtained 
in the loop process after S913. 

Next, the degree n is set to 1 (S913). Then attribute 
Vy^ is given to each atom (S914). The attribute Vy" is 
the number of atoms bonding to an atom of input 
number i and having a class number j in the degree n - 
1 . Further, attributes (aj, by, dy, Vjj") are arranged for 
each atom, class numbers Cj" are given in the ascend- 



ing order of the numeral strings from the smallest, and 
the atoms are classified into a plurality of classes 
(S915). Then it is checked whether the number Np of 
classes is equal to N^n-i). and the process is terminated 
5 if equal. Or, it is checked whether the number Nn of 
classes is equal to the total atom number, and the proc- 
ess is terminated if equal (S916). When neither is equal. 
1 is added to n and the processing returns to S914 
(S917). 

10 Next, the process in each step of constituent atom 
classification routine 91b will be explained in detail with 
an example of 3, 5-dimethyl-2. 3, 4, 5-tetrahydropyrid- 
ine. 

First executed is the process of S91 1 . Upon execu- 

15 tion of this process the data as shown in Figs. 22A and 
22B has already been written in the bond table 81 and. 
based on each data written in the bond table 81, the 
three types of attributes (aj. by, djj) are given to each 
atom. Here, the input numbers recorded in this bond 

20 table 81 are arbitrary numbers given in the order of 
handwritten input of each atom, as shown in Fig. 23. 

The attribute aj is gained as follows. As described 
previously, the attribute aj is a kind number of atom of 
input number i. Here, an element symbol of each atom 

25 is recorded in the bond table 81 , and the kind numbers 
can be attained from these element symbols. Therefore, 
by reading an element symbol out of the bond table 81 . 
the attribute aj corresponding to this element symbol 
can be obtained. As a result, we obtain a^ , a2. a4-a8 = 6, 

30 and as = 7. 

The attribute by is obtained as follows. As dis- 
cussed previously, the attribute by is the number of 
bonds adjoining an atom of input number i and having a 
bond kind number thereof being j. A type of bond of 

35 each atom is recorded in the bond table 81, and the 
attribute by can be attained by reading this type of bond 
out of the bond table 81 . As a result, we obtain b^j = (3, 
0. 0, 0), bgj = (1.1. 0, 0), bsj = (1.1, 0. 0). b4j = (2. 0. 0, 
0). bsj = (3. 0. 0. 0), bej = (2, 0. 0, 0), byj = (1.0, 0, 0). 

40 andb8j = (1,0,0,0). 

Specifically, the attribute by is obtained using the 
reference table T shown in Figs. 24A and 24B. The ref- 
erence table T is formed as a matrix D(x. y) indicating 
the bond relation between two atoms, and is prepared 

45 based on the data of bonding atom pair and type of 
bond in the bond table 81. Namely, a type of twnd j is 
written in a matrix element indicated by each bonding 
atom pair, thus preparing the reference table T. 

Extraction of attribute by using this reference table T 

50 is carried out as follows. First, matrix elements satisfy- 
ing x = 1 or y = 1 (the matrix elements hatched in Fig. 
24A) are checked among those of the reference table T 
to extract data (type of bond) j written in the matrix de- 
ments. As a result, we obtain D(1, 2) = 1, D(1, 6) = 1. 

55 and D(1 . 8) = 1 . Since all data j of the three matrix ele- 
ments thus obtained are 1, we obtain bii =3. Since 
there is no matrix element with data j being two or more, 
we obtain bi2-bi4 = 0. 
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Next, matrix elements satisfying X = 2 or Y = 2 (the 
matrix elements hatched in Fig. 24B) are checked 
among those of the reference table T to extract data 
written in the matrix elements. As a result, we obtain 
D(1 , 2) = 1 and D(2, 3) = 2. TTie data j of the matrix ele- 
ments thus obtained is 1, 2, each of which is one. and 
thus. b2i = b22 = 1 • Since there is no matrix element with 
data j being 3 or more, we obtain bas = b24 = 0. 

Through the same process for i = 3-8, the attributes 
by (i = 1 -8, j = 1-4) shown in Fig. 25 are attained. 

Further, the attribute dy is obtained as follows. As 
discussed previously, the attribute dy is the number of 
routes that can be traced from an atom of input number 
i through j bonds in the shortest path. Specifically, 
describing it based on the molecular structure diagram 
of Fig. 23, routes that can be traced from the atom of 
input number 1 through one bond are three in total: 
(input number 1 to input number 2); (input number 1 to 
input number 6); (input number 1 to input number 8). 
Routes that can be traced from the atom of input 
number 1 through two bonds are two in total: (input 
number 1 to input number 2 to input number 3); (input 
number 1 to input number 6 to input number 5). 

Further, routes that can be traced from the atom of 
input number 1 through three bonds in the shortest path 
are three in total: (input number 1 to input number 2 to 
input number 3 to input number 4); (input number 1 to 
input number 6 to input number 5 to input number 4); 
(input number 1 to input number 6 to input number 5 to 
input number 7). Moreover, there is no route tracing 
from the atom of input number 1 through four bonds in 
the shortest path. From the results of the above proc- 
esses, we obtain d^j = (3. 2. 3, 0). 

Through the same processes, we obtain d2j = (2, 3. 
2, 2). d3j = (2. 2. 4. 0). d4j = (2, 3, 2. 2). dgj = (3. 2. 3. 0). 
dgj = (2, 4, 2. 0), d7j = {1,2, 2. 3), and dgj = (1. 2. 2. 3). 

Specifically, the attributes dy are obtained referring 
to the reference table T in the same manner as the 
attributes by. This extraction of attributes dy referring to 
the reference table T is carried out in the order of i = 1 , i 
= 2 The attribute dij (i = 1) is first extracted. 

The extraction of attribute dij (i = 1) is to check 
matrix elements satisfying X = 1 or Y = 1 (the matrix ele- 
ments hatched in Rg. 26A) among those off the refer- 
ence table T and to extract a matrix element in which 
data is written. TTien, 1 is written as a bond path number 
in each matrix element extracted. As a result, the bond 
path number 1 is written in D(1, 2). D(1, 6), and D(1. 8) 
(each bond path number is shown as enclosed in a tri- 
angle in Fig. 26A). 

Next extracted are suffixes S = (1 , 2), (1 . 6), (1 . 8) of 
the matrix elements each having the bond path number 
1 written. From these suffixes S, 1 . which has been 
used in the previous extraction process, is excluded, 
thus obtaining S = 2, 6. 8. Based on S = 2, 6. 8 thus 
obtained, matrix elements satisfying X = 2, 6, 8 or Y = 2, 
6, 8 (the matrix elements hatched in Rg. 268) are 
checked to extract a matrix element with data written 



therein and with no bond path number written yet. Then, 
2 is written as a bond path number in each matrix ele- 
ment extracted. As a result, the bond path number 2 is 
written in D(2. 3) and D(5, 6). 

5 Further, extracted are suffixes S = (2. 3), (5. 6) of 
the matrix elements with the bond path number 2 writ- 
ten therein. From these suffixes S, 2, 6. having already 
been used in the previous extraction process, are 
excluded, thus obtaining S = 3, 5. Based on S = 3. 5 

10 thus obtained, matrix elements satisfying X = 3, 5 or Y = 
3, 5 (the matrix elements hatched in Rg. 27) are 
checked to extract a matrix element with data written 
therein and without no bond path number written yet. 
Then. 3 is written as a bond path number in each matrix 

IS element extracted. As a result, the bond path number 3 
is written in D(3, 4). D(4, 5), and D(5, 7). 

Through the above processes, the bond path num- 
bers are written in the all matrix elements. As a result, 
there are three matrix elements with the bond path 

20 number 1, two matrix elements with the bond path 
number 2, three matrix elements with the bond path 
number 3. and no matrix element with the bond path 
number 4, thus attaining d^j = (3. 2. 3. 0). 

Next, the attribute cl2j (i = 2) is extracted. The 

25 extraction of attribute d2j (i = 2) is to check matrix ele- 
ments satisfying X 2 or Y = 2 (the matrix elements 
hatched in Fig. 28A) among those of the reference table 
T and to extract a matrix element with data written 
therein. Then, 1 is written as a bond path number in 

30 each matrix element extracted. As a result, the bond 
path number 1 is written in D(1. 2) and D(2, 3) (each 
bond path number is shown as enclosed in a triangle in 
Rg. 28A). 

Next extracted are suffixes 8 = (1. 2), (2, 3) of 

35 matrix elements each with the bond path number 1 writ- 
ten therein. Excluding 2. having already been used in 
the previous extraction process, from these suffixes S. 
we obtain S = 1. 3. Based on 8 = 1, 3 thus obtained, 
matrix elements satisfying X=:1,3orY=1,3 (the 

40 matrix elements hatched in Fig. 28B) are checked to 
extract a matrix element with data written therein and 
with no bond path number written yet. Then. 2 is written 
as a bond path number in each matrix element 
extracted. As a result, the bond path number 2 is written 

45 inD(1,6), D(1,8),andD(3,4). 

Further, extracted are suffixes 8 = (1. 6). (1, 8), (3. 
4) of the matrix elements with the bond path number 2 
written therein. Excluding 1. 3, having already been 
used in the previous extraction process, from these suf- 

50 fixes 8, we obtain 8 = 4, 6, 8. Based on 8 = 4. 6, 8 thus 
obtained, matrix elements satisfying X = 4, 6, 8 or Y = 4, 
6, 8 (the matrix elements hatched in Fig. 29A) are 
checked to extract a matrix element with data written 
therein and with no bond path number written yet. Then, 

55 3 is written as a bond path number in each matrix de- 
ment extracted. As a result, the bond path number 3 is 
written in D(4, 5) and D(5, 6). 

Furthermore, extracted are suffixes 8 = (4, 5), (5, 6) 
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of the matrix elements with the bond path number 3 
written therein. Excluding 4. 6, having already been 
used In the previous extraction process, from these suf- 
fixes S, we obtain S = 5, 5 (which means that S = 5 is 
doubly applied). Based on S = 5. 5 thus obtained, matrix 
elements satisfying X = 5 or Y o 5 (the matrix elements 
hatched in Fig. 29B) are checked to extract a matrix ele- 
ment with data written therein and with no bond path 
number written therein yet. Then, 4 is written as a bond 
path number in each matrix element extracted. As a 
result, two of the bond path number 4 are written in D(5. 
7). 

Through the above processes, the bond path num- 
bers are written in the alt matrix elements. As a result, 
there are two matrix elements with the bond path 
number 1, three matrix elements with the bond path 
number 2, two matrix elements with the bond path 
number 3, and two matrix elements with the bond path 
number 4. thus attaining d^j = (2. 3, 2, 2). 

By the same processes for i = 3 to 8. dy (i = 1 to 8. j 
= 1 to 4) shown in Fig. 25 are attained. The process of 
S911 as described above gave the three types of 
attributes (aj, bjj. dy) to each of the atoms constituting 3, 
5-dimethyl-2. 3, 4, 5-tetrahydropyridine. 

Next executed is the process of S91 2. As described 
above, at S912 the attributes (aj. by. dy) for each atom 
are arranged in a 9-digit numeral string, and class num- 
bers Cj^ are given to the atoms in the ascending order 
of the numeral strings from the smallest, thus classifying 
the atoms Into a plurality of classes. The dass numbers 
Cj° given herein are zeroth-degree class numbers. 

Describing the process of S912 specifically, the 
numeral string of the atom of input number 1 is 
"630003230" and the numeral string of the atom of input 
number 2 is '*611 002322". Following it in order, we 
obtain "711002240". "620002322", "630003230". 
"620002420", "610001223". and "610001223". 

As a result, the numeral strings of the atoms of 
input numbers 7 and 8 are minimum, so that the class 
number Cy^ = Ca^ = 1 is given to these atoms. Similarly, 
the class number = 2 is given to the atom of input 
number 2, and the class number 64° = 3 to the atom of 
input number 4, Also, the class number 05° = 4 is given 
to the atom of input number 6, and the class number 

= = 5 to the atoms of input numbers 1 and 5. 
Further, the dass number 63^ = 6 is given to the atom 
of input number 3 (see Fig. 30A). The atoms are classi- 
fied into the six classes in this manner, and thus the 
number Nq of classes is 6. 

Next, the process of S913 is carried out to set the 
degree n to 1 . 

Further, the process of S914 Is carried out. As 
described previously, the attribute Vy is given to 
each atom at S914. Here, the attribute Vy" is the 
number of atoms bonding to an atom of input number i 
and having a dass number of j. Namely, describing it 
based on the molecular structure diagram of Fig. 30B. 
input numbers of atoms bonding to the atom of input 



number 1 are 2. 6. 8, and the class numbers of these 
atoms are €2° = 2, Cq^ = 4. and Cs^ = 1 . As a result, 1 
is written in the attribute V^j^ of j = 1 . 2, 4. thus obtaining 
VijU (1,1. 0,1,0, 0). 

5 Also, input numbers of atoms bonding to the atom 
of input number 2 are 1. 3, and the class numbers of 
these atoms are = 5 and 63^ = 6. As a result, 1 is 
written in the attribute Vgj^ of j = 5, 6. thus obtaining V2j^ 
= (0, 0, 0, 0, 1 , 1). The same processes for the atoms of 

10 input numbers 3 to 8 will result in obtaining Vaj^ =(0,1, 
1.0. 0. 0), V4|^ = (0, 0, 0. 1, 1,0), V5jU(1.0. 1, 1.0. 0). 
Vej^ = (0, 0, 0, 0. 2, 0), V7j^ = (0, 0, 0, 0. 1, 0), and Vgj^ 
= (0, 0, 0, 0, 1,0). 

Specifically, the attributes Vy" are obtained using 

15 the reference table T shown in Figs. 24A and 24B. 
Extraction of attributes Vy** using this reference table T 

is carried out in the order of i = 1. i = 2 First, attribute 

Vij^ (i = 1) is extracted. Extraction of attribute V-ij^ (i = 1) 
Is to check the matrix elements satisfying x = 1 or y = 1 

20 (the matrix elements hatched in Fig. 24A) among the 
matrix elements of the reference table T and to extract 
suffixes S = (1, 2), (1, 6), (1, 8) of the matrix elements 
with data written therein. Excluding i = 1 from these suf- 
fixes S, we obtain S = 2. 6, 8. Substituting the values of 

25 S thus obtained into the class number Cj^. we obtain 
C2° = 2, Cg^ = 4, and 63° = 1 ■ Then, 1 is written in the 
attribute V^j^ of j = 1 , 2. 4, thus obtaining V^j^ =(1.1,0, 
1.0. 0). 

Next, the attribute V2j^ (i = 2) is extracted. The 
30 extraction of attribute (i = 2) is to check the matrix 
elements satisfying X = 2 or Y = 2 (the niatrix elements 
hatched in Fig. 24B) among the matrix elements of the 
reference table T and to extract suffixes S = (1 . 2), (2. 3) 
of the matrix elements with data written therein. Exclud- 
35 ing i = 2 from these suffixes S. we obtain S = 1 . 3. The 
values of S thus obtained are substituted into the class 
number Cj°, thus obtaining Ci° = 5 and Ca^ = 6. Then 1 
is written in the attribute V2j'' of j = 5, 6, thus attaining 
V2j^=(0, 0, 0, 0.1,1). 
40 The same processes for i = 3 to 8 will result in 
obtaining the attributes Vy'' (i = 1 to 8, j = 1 to 6) shown 
in Fig. 31. 

Next executed is the process of S91 5. As described 
previously, at S915 the attributes (Ci"'\ Vy") are 

45 arranged for each atom, and class numbers C" are 
given to the atoms in the ascending order of the 
numeral strings from the smallest, thus classifying the 
atoms into a plurality of classes. 

Specifically, the numeral string of the atom of input 

50 number 1 is "5110100" and the numeral string of the 
atom of input number 2 is "2000011". Following it in 
order, we obtain "6011000". "3000110". "5101100", 
"4000020". "1000010". and "1000010". 

As a result, the numeral strings of the atoms of 

55 input numbers 7 and 8 are minimum, and the dass 
number Cy*" = Cg^ = 1 is given to these atoms. Similarly, 
the dass number 62^ = 2 is given to the atom of input 
number 2, and the class number = 3 to the atom of 
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input number 4. Further, the class number Ce = 4 is 
given to the atom of input number 6, and the class 
number Cs^ = 5 to the atom of input number 5. Further- 
more, the class number C^^ - 6 is given to the atom of 
input number 1, and the class number C^^ = 7 to the 
atom of input number 3. The atoms are classified into 
the seven classes in this manner, and the number of 
classes is 7. 

The process of S916 is next executed to check 
whether the number of classes is equal to N(n.i)> and 
the process is terminated If equal. Also, whether the 
number of classes is equal to the total atom number 
is checked, and the process is terminated if equal. Here, 
since the number of classes is 7 and the number Nq 
of classes is 6, N-i is not equal to Nq. Also, since the 
total number of atoms is 8, the number N-i of classes is 
not equal to the total number of atoms. Since neither is 
equal in this way, the process of S91 7 is executed to set 
nto2. 

Further, the process returns to S914 to give the 
attribute Vy^ to each atom. As a result, as shown in Fig. 
32, we obtain Vij^ = (1 , 1 , 0, 1 , 0, 0. 0). Vgf = (0, 0, 0. 0, 
0. 1, 1), V3|2 = (0, 1, 1, 0. 0. 0, 0). V4|2 = (0, 0, 0, 0. 1, 0, 
1). Vsf = (1. 0, 1, 1, 0, 0, 0), V6j2 = (0, 0, 0, 0, 1. 1. 0), 
V7j2 = (0. 0, 0, 0. 1 . 0. 0). and Vsj^ = (0. 0. 0, 0,0,1. 0). 

Then the process of S915 is carried out to give the 
class number Cj^ to each atom. As a result, as shown in 
Fig. 30C, we obtain = 7, Cg^ = 3, Ca^ = 8, C^^ = 4, 
Cs^ = 6, Ce^ = 5, Cy^ = 2, and Cg^ = 1 . The atoms are 
classified into the eight classes in this manner, and the 
number N2 of classes is 8. Since the number of classes 
N2 = 8 is equal to the total number of atoms, the process 
is terminated by determination at S916. 

Next explained using the flowchart of Fig. 33 is the 
process of canonical number assignment routine 91c 
called at S920 of Rg. 20. Here, a canonical number is a 
number of each atom uniquely determined depending 
upon the structure of a compound. Namely, an input 
number given by handwritten input of molecular struc- 
ture diagram is an arbitrary number changing depend- 
ing upon change of input order. In contrast with it, the 
canonical data 82 is unique data depending only on the 
structure of compourtd. Therefore, it is difficult to directly 
make the unique canonical data 82 from the arbitrary 
input numbers. TTius, the canonical data preparation 
program 91 enables smooth preparation of canonical 
data 82 by converting the input numbers once into 
canonical numbers and preparing the canonical data 82 
based on the unique canonical numbers. 

In the process of canonical number assignment 
routine 91c, first, 1 is given to variable k (S921). Next, 
the final class numbers 0/ obtained in the constituent 
atom classification routine 91b are checked, and a 
canonical number k (k - 1 herein) is given to the atom 
with the maximum class number (S922). If there are a 
plurality of maximum atoms, an arbitrary atom Is 
selected out of these atoms, and the canonical number 
k is given to this atom. After canonical numbers have 



been assigned to all atoms, then the process is termi- 
nated (S923). 

Next. 1 is added to the variable k (S924), and. out of 
the atoms for each of which the canonical number Is 

5 decided (which will be referred to as decided atoms), a 
decided atom to which an atom for which a canonical 
number is not decided (which will be refen-ed to as an 
undecided atom) bonds Is extracted (S925). Then 
whether there are plural decided atoms extracted is 

10 determined (S926), and If there are plural decided 
atoms extracted, a decided atom with the minimum 
canonical number is selected out of these decided 
atoms (S927). Then an undecided atom with the maxi- 
mum class number Is extracted out of the undecided 

75 atoms bonding to the decided atoms thus selected, and 
the canonical number of this undecided atom is deter- 
mined as k (S928). If there are plural decided atoms 
with the maximum class number Cj^, an arbitrary one Is 
selected out of these decided atoms. 

20 When one decided atom is determined at S926, an 
undecided atom with the maximum class number Cj* Is 
selected out of the undecided atoms bonding to this 
decided atom and is given the canonical number k 
(S929). After completion of the processes of S928 and 

25 S929 the processing returns to S923, and the loop of 
S923 to S929 is repeated until the canonical numbers 
are assigned to the all atoms. 

Next, the process of canonical number assignment 
routine 91 c is explained with a specific example using 3. 

30 5-dimethyl-2, 3, 4, 5-tetrahydropyridine. First, 1 1s given 
to the variable k In the process of S921 and then the 
process of S922 Is carried out In the process of S922, 
since the atom of Input number 3 has maximum C3* = 8, 
the canonical number k = 1 is given to the atom of Input 

35 number 3. Next, the process of S924 Is executed to 
change the variable k to 2, and the process of S925 is 
then carried out to extract the atom of input number 3 as 
a decided atom. 

Since there Is one decided atom thus extracted, the 

40 process of S929 Is then can-ied out. Since undecided 
atoms bonding to the atom of Input number 3 are the 
atoms of input numbers 2, 4, an atom with the maximum 
class number C/ is selected out of these atoms. 
Namely, the class number of the atom of input number 2 

45 Is C2 = 3. and the class number of the atom of input 
number 4 is C4 = 4. Thus, the atom of Input number 4 Is 
selected, and the canonical number k = 2 Is given to this 
atom. 

Next, the flow returns to the process of S924 to 
50 change the variable k to 3. and the process of S925 Is 
carried out to extract the atoms of Input numbers 3, 4 as 
decided atoms. Since there are plural decided atoms 
thus extracted, then the process of S927 Is carried out 
to select an atom with the minimum canonical number 
55 out of the decided atoms thus extracted. Namely, the 
canonical number of the atom of Input number 3 Is 1 and 
the canonical number of the atom of input number 4 is 
2. Thus, the atom of input number 3 Is selected. Then 
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the process of S928 is carried out to give the canonical 
number k = 3 to the atom of Input number 2 tx)ndlng to 
the atom of input number 3. 

Further, the flow returns to the process of S924 to 
change the variable k to 4, and the process of S925 is 
can-ied out to extract the atoms of input numbers 2. 4 as 
decided atoms. Since there are plural decided atoms 
thus extracted, then the process of S927 is carried out 
to select an atom with the minimum canonical number 
out of the decided atoms thus extracted. Namely, the 
canonical number of the atom of input number 2 is 3 and 
the canonical number of the atom of Input number 4 is 
2. Thus, the atom of input number 4 is selected. Then 
the process of S928 is carried out to give the canonical 
number k = 4 to the atom of input number 5 bonding to 
the atom of input number 4. 

Repeating the same processes, the canonical 
number 5 is assigned to the atom of input number 1 and 
the canonical number 6 to the atom of input number 6. 
respectively. Also, the canonical number 7 is given to 
the atom of Input number 7 and the canonical number 8 
to the atom of input number 8, respectively 

After that, the process of S923 Is carried out, and 
because the canonical numbers are obtained for the all 
atoms at this stage, the process is terminated. As a 
result, the canonical numbers as shown in Fig. 34 are 
obtained. 

Next explained using the flowchart of Fig. 35 is the 
process of canonical data preparation routine 91 d 
called at S930. In this process, first, the input numbers 
are replaced by the canonical numbers, as shown in 
Figs. 36A and 36B, to rewrite the bond table 81 (S931). 
Then, based on this bond table 81 , three types of data 
(Pi, Tj, Sj) is obtained for each atom (S932). Here. Pj is 
a canonical number of an atom bonding to an atom of 
canonical number i (i > 1) and having a minimum 
number. Also, Tj Is a symbol of type of bond between an 
atom of canonical number i (1 > 1) and an atom of 
canonical number Pj (• for single bond, = for double 
bond. # for triple bond, % for aromatic bond, and so on 
in this example). Further, Sj is a symbol for a type of 
atom of canonical number i (i > 0) (which Is an element 
number in this case). 

Specifically, first, an element number of the atom of 
canonical number 1 is checked with reference to the 
atomic table 81 g. This will result in obtaining Si = "N". 
Next, which atom bonds to the atom of canonical 
number 2 Is checked referring to the atomic pair table 
81 h. As a result, the atoms of canonical numbers 1, 4 
are obtained. Since the minimum canonical number is 1 
out of these atoms. P2 - 1 . Since the bond between the 
atom of canonical number 2 and the atom of canonical 
number 1 is a single bond, T2 = Further. S2 = "C is 
obtained referring to the atomic table 81 g. 

Next, which atom bonds to the atom of canonical 
number 3 is checked referring to the atomic pair table 
81 h. As a result, the atoms of canonical numbers 1, 5 
^ are attained. Since the minimum canonical number is 1 



among these atoms, P3 = 1. Since the bond between 
the atom of canonical number 3 and the atom of canon- 
ical number 1 is a double bond, T3 = Further, refer- 
ring to the atomic table 81g, S3 = "C" Is obtained. The 

5 same processes to follow obtain P4 = 2, P5 = 3, Pe = 4, 
P7 = 4, Pg = 5. T4 to Tq = and S4 to 83 = "C". 

Next extracted is a bonding atom pair which was 
not referred to upon obtaining Tj in the process of S932 
(S933). This process is carried out refen^ing to the 

10 atomic pair table 81 h. This will result in extracting a 
bonding atom pair of the atom of canonical number 5 
and the atom of canonical number 6. Then three types 
of data (R^ R^j, Hj) are obtained for the bonding atom 
pair thus extracted (S934). Here, R^'j, R^j are canonical 

75 numbers of two atoms constituting the bond. Also. Hj is 
a symbol for a type of the bond (the same symbols as Tj 
are used in this example). It is assumed that R^j and R^j 
satisfy the relation of R^j > R^j, With another bonding 
atom pair (R\, R^iJ, they are supposed to satisfy the 

20 relation of R^j ^ R\ or the relation of R"*] = R\ and R^j 
<R'k- 

The above processes prepared the canonical tree 
structure data shown in Fig. 37. 

Next, the data obtained in the processes of S932 
25 and S934 is aligned In line, thus preparing canonical 
data (S935). Namely, defining a delimiter F different 
from the symbols for the types of atom and for the types 
of bond, the data obtained in the processes of S932 and 
S934 is aligned as follows. 

30 Si, P2, T2, S2, P3, T3. S3, P4, T4, S4 Pn, Tfg, Sn, 

F. R^, Hi, n\ F. B\, H2, Pih F. RV. Hm. R^m. F 

Here, N is the total number of atoms and M is the 
total number of bonding atom pairs extracted at S934. 
The data string thus obtained is canonical data 
35 uniquely corresponding to the structure of compound. 
Specifically, using T as the delimiter F, the obtained 
data is aligned in the predetermined order as follows. 

"N 1 =C1 =C2-C3-C4-C4-C5-C/5-€/" 
Then this canonical data is written Into the conopound 
40 Information file 3 1 to be saved therein (S936) . After that, 
the process is terminated. 

The canonical data preparation means and method 
according to the present invention are not limited to the 
above embodiment, but may be modified within the 
45 scope not departing from the spirit of the present inven- 
tion, for example as follows. 

(1) The above emtKXliment used the data string 
including the symbols Sj for the types of atom as the 

so canonical data, but the symbol for the type of the 
atom with the highest frequency of occurrence 
(which is normally C for carbon) may be excluded 
from the data string. Namely, omitting the symbol 
for carbon C out of the above canonical data, we 

55 obtain the following. 

"N1 -1=2-3-4-4-5-/5-6/" 
Shortening the data string In this manner can 
reduce the quantity of data written into the com- 
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pound information file 31 . 

(2) The following processes may be added to the 
canonical number assignment routine 91c in the 
case of a plurality of undecided atoms with the max- 
imum class number C/ being selected in the proc- 
ess of S929. 

(a) If an undecided atom with the maximum 
class number 0/ does not belong to a cyclic 
structure portion, an arbitrary undecided atom 
is selected out of the plurality of undecided 
atoms and k is assigned as a canonical number 
of this undecided atom. After that, the process- 
ing returns to S923. 

(b) If an undecided atom with the maximum 
class number C\ belongs to a cyclic structure 
portion, as to a structure obtained by cutting 
bonds between the undecided atoms selected 
at S929 (hereinafter referred to as candidate 
atoms) artd decided atoms bonding to these 
candidate atoms, the following vector quantity 
is defined for each candidate atom. 

mji^: the minimum bond number between candidate 
atom i and atom with canonical number k 

The order of priority is preliminarily determined as 
to this attribute, and an atom i with the highest priority 
order is selected and k is assigned as a canonical 
number of the atom. After that, the process returns to 
S923. 

Here, criteria of judgment of priority order in 
attribute values of atonrts are as follows. Rrst, non-vec- 
tor quantities depend upon the degree of priority order. 
As for vector quantities, when elements of two vectors i. 
k are attributes Vy. V^j, the magnitude at minimum j 
among the elements with Vy ^ V|^j is employed as a cri- 
terion of judgment of priority order. By employing such 
criteria of judgment, priority orders of the attributes by, 
dy, Vjj", my can be determined. In the case of priority 
orders being determined by a plurality of attributes, pri- 
ority orders are preliminarily determined among the 
attributes, and priority is given to judgment in an 
attribute with a higher priority order 

The above canonical data preparation method 
according to the present invention was used to obtain 
the canonical data of Ceo nnolecule shown in Fig. 38A, 
and the canonical data (Fig. 38B) for uniquely specify- 
ing the structure of the Ceo molecule was obtained just 
in 1 .5 seconds. To the contrary, when the canonical data 
of the Ceo inolecule was obtained using an information 
processing apparatus of same performance by the Mor- 
gan algorithm without intervention of the process for 
classifying the atoms into equivalent atoms, 550 sec- 
onds were needed to achieve the canonical data. 
Therefore, if the above canonical data preparation 
means and method according to the present invention 
are employed in the present invention, the speed of the 



biochemical information processing according to the 
present invention can be improved remarkably. 

The foregoing explained the preferred embodiment 
of the biochemical information processing apparatus 

5 and method of the present invention, but it should be 
understood that the present invention is not limited to 
the above embodiment. 

For example, the canonical data preparation means 
(the canonical data preparation program 91) according 

10 to the present inverrtion does not have to be incorpo- 
rated together with the other means (the reaction 
scheme detection program 25 etc.) in the first storage 
device in the biochemical information processing appa- 
ratus of the present invention, but, as shown in Fig. 39 

IS and Fig. 40, the canonical data preparation means (the 
canonical data preparation program 91) according to 
the present invention and the other means (the reaction 
scheme detection program 25 etc.) may exist separately 
from each other in the first storage device 20. 

20 Also, the biochemical information processing appa- 
ratus of the present invention does not have to comprise 
all of the reaction scheme detection means (the reaction 
scheme detection program) 25, the receptor information 
detection means (the receptor information detection 

25 program) 26, and the reaction path detection means 
(the reaction path detection program) 27, but the appa- 
ratus may be arranged, for example, to be provided with 
the reaction scheme detection means (the reaction 
scheme detection program) 25 and the reaction path 

30 detection means (the reaction path detection program) 
27, as shown in Fig. 41 , or to be provided with only 
either one of them. In this case, the receptor information 
file 36 Is not necessary, and the mutual relation between 
tiie compound numbers C1-C7 and the enzyme num- 

35 bers Ei-Ee described in tiie reaction path diagram of 
Fig. 2 is recorded in the relation information file 33 
shown in Fig. 42. Describing in more detail, tiie enzyme 
numbers EfEe of tiie enzymes witii each compound of 
compound number C^-Ce being a substrate, tiie 

40 enzyme numbers E-| -Ee of tiie enzymes with each com- 
pound of compound number C2-C7 being a product, and 
the enzyme number E4 of the enzyme inhibited by the 
compound of conpound number Ce are recorded in tiie 
form of a list con-esponding to the conpound numbers 

45 C1-C7. Therefore, when access is made to the relation 
information file 33 using tiie compound number C1-C7 
as a key, tiie apparatus can read out the enzyme num- 
bers E^ -Ee of tiie enzymes with each compound of com- 
pound number C1-C7 being a substrate or a product. 

50 and the enzyme number E4 of the enzyme inhibited by 
the compound of compound number Ce- The main pro- 
gram 23 in this case is tiie same as Fig. 1 1 except tiiat 
it excludes step S1 1 5 for calling the receptor information 
indication program, as shown in Fig. 43. 

55 Next explained is a biochemical information compu- 
ter program product (recording medium) according to 
an embodiment of the present invention. 

Rg. 44 is a block diagram to show tiie sti'ucture of 
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the biochemical information computer product (record- 
ing medium) 2 according to the embodiment of the 
present invention. As shown in the drawing, the bio- 
chemical information recording medium 2 of the present 
embodimerrt comprises a file area 2b for recording files, s 
and a program area 2a for recording programs. 
Recorded in the file area 2b are a compound informa- 
tion file 31. an enzyme information file 32. a relation 
information file 33, a partial correlation data file 34. a 
bond table file 35, and a receptor information file 36. 

Among them, the compound information file 31 
stores a list showing the relation between compound 
numbers of compounds and canonical data correspond- 
ing to the compounds, and additional information (also 
referred to as reference data) about the compounds. 
The enzyme information file 32 stores a list showing the 
relation among enzyme numbers of enzymes, com- 
pound numbers of compounds being substrates for the 
enzymes, and compound numbers of compounds being 
products by the enzymes, and additional irrformation 
about the enzymes. 

Further, the relation information file 33 stores a list 
showing the relation among conpound numbers of 
compounds, enzyme numbers of enzymes with a rele- 
vant compound being a substrate, enzyme numbers of 
enzymes with a relevant compound being a product, 
receptor numbers of receptors with a relevant com- 
pound being an agonist, and receptor numbers of 
receptors with a relevant compound being an antago- 
nist. Furthermore, the partial con-elation data file 34 is 
prepared to store the reaction path information, and the 
bond table file 35 to store the bond table data, respec- 
tively. Moreover, the receptor information file 36 stores a 
list showing the relation among receptor numbers of 
receptor, compound numbers of compounds being ago- 
nists for the receptors, compound numbers of com- 
pounds being antagonists for the receptors, and 
additional information about the receptors. 

The biochemical information processing program 
22 is recorded in the program area 2a. The biochemical 
information processing program 22 comprises the main 
program 23 for generally controlling the processing, the 
three-dimensional indication program 24 for three- 
dimensionally displaying the image data, the reaction 
scheme detection program 25 for detecting a chemical 
reaction scheme between compounds, the receptor 
information detection program 26 for detecting the addi- 
tional information about receptor, and the reaction path 
detection program 27 for detecting a reaction path of 
plural compounds. The reaction scheme detection pro- 
gram 25 comprises the first process routine 25a to the 
fourth process routine 25d, the receptor information 
detection program 26 does the fifth process routine 26a 
to the eighth process routine 26d, and the reaction path 
detection program 27 the ninth process routine 27a to 
the thirteenth process routine 27e. 

A disk type recording medium, for example, such as 
a flexible disk or a CD-ROM, is used as the biochemical 



information recording medium 2. Also, a tape type 
recording medium such as a magnetic tape may be 
applied. 

The biochemical information recording medium 2 of 
the present embodiment can be used in the information 
processing apparatus 1 shown in Fig. 45 and Fig. 46. In 
detail, the information processing apparatus 1 has a 
medium drive device 3 and the biochemical information 
recording medium 2 can be loaded in the medium drive 
device 3. Then this loading enables access to the bio- 
chemical information recorded in the biochemical infor- 
mation recording medium 2 by the medium drive device 
3. This makes it possible to carry out the biochemical 
information processing program 22 recorded in the pro- 
gram area 20 by the information processing apparatus 
1. 

The structure of this information processing appa- 
ratus 1 is as follows. First, it is provided with the above- 
described medium drive device 3, the image memory 10 
for storing the image data indicating the molecular 
structure diagram or the like of compound, the work 
memory (inner memory) 1 1 with resident operating sys- 
tem (OS), and the display 40 as display means. Also, it 
is provided with the input device 50 being input means 
having the mouse 51 for accepting input of image data 
and the keyboard 52 for accepting input of symbolic 
data, the printer 60 for outputting the image data or the 
like, and the CPU 70 for controlling execution or the like 
of the biochemical information processing program 22. 

The medium drive device 3 applied is a flexible disk 
drive device, a CD-ROM drive device, a magnetic tape 
drive device, or the like, depending upon the biochemi- 
cal information recording medium 2. 

The detailed structure of the compound information 
file 31 , enzyme information file 32, relation information 
file 33, partial correlation data file 34, bond table file 35, 
and receptor information file 36 recorded in the bio- 
chemical information recording medium 2 of the present 
emt)odiment is as described previously (Fig. 2 to Fig. 6). 

The flow of data in the information processing 
apparatus 1 is also as described previously, the image 
data 80 input is converted into either of the bond table 
data 81, canonical data 82, and three-dimensional data 
83 to be used, and the canonical data 82 is mainly used 
in the biochemical information program 22 recorded in 
the program area 2a, which is also as described previ- 
ously (Fig. 7 to Fig. 10). 

Next explained is the process of biochemical infor- 
mation processing program 22 recorded in the program 
area 2a of the biochemical information recording 
medium 1 . This process is carried out by executing the 
biochemical information processing program 22 read 
out by the medium drive device 3. This execution first 
starts the main program 23 of the biochemical informa- 
tion processing program 22. 

The details of the processes of main program 23, 
three-dimensional indication program 24, reaction 
scheme detection program 25, reaction path detection 
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program 27, arxJ receptor information detection pro- 
gram 26 thereafter are also as described previously 
(Fig. 1 1 to Fig. 1 5). and, for exanrple as shown in Fig. 1 6 
and Fig. 17, reaction scheme data or the like is indi- 
cated on the display 40. 5 

Next explained is the canonical data preparation 
program suitably applicable to the present invention. 

The biochemical information computer program 
product (recording medium) 2, being the embodiment of 
the present invention and shown in Fig. 44, is provided io 
with the canonical data preparation program according 
to the present invention: that is, the biochemical infor- 
mation recording medium 2 is provided with the file area 
2b for storing files and the program area 2a for storing 
programs. The bond table file 35, compound information is 
file 31 , etc. are stored in the file area 2b. 

A plurality of bond tables 81 can be recorded in the 
bond table file 31 . Recorded in a bond table 81 is char- 
acteristic data about each of atoms constituting a com- 
pound and bond pair data between atoms, and the 20 
canonical data preparation program 91 can access 
these data through the bond table 81 . 

The compound information file 31 and bond table 
81 are as described previously (Fig. 3. Fig. 18A, and 
Fig. 18B). 25 

The canonical data preparation program 91 is 
stored in the program area 2a. The canonical data prep- 
aration program 91 is a program for preparing the 
canonical data, based on the characteristic data about 
each of the atoms constituting the compound and the 30 
bond pair data between atoms. This canonical data 
preparation program 91 conrprises the main routine 91a 
for generally controlling the processes and the constitu- 
ent atom classification routine 91b for assigning a class 
number to each of atoms constituting a compound. The 35 
canonical data preparation program 91 also comprises 
the canonical number assignment routine 91c for 
assigning a canonical number to each atom, based on 
the class numbers, and the canonical data preparation 
routine 91 d for preparing the canonical data based on 40 
the canonical numbers of the respective atoms. 

The biochemical information recording medium 2 
can be utilized in the information processing apparatus 
1 shown in Fig. 45, as described previously Pointing 
devices other than the mouse 51 include a tablet, a dig- 45 
itizer, a light pen, and so on, and the mouse 51 may be 
replaced by either one of these devices. 

The schematic operation of the information 
processing apparatus 1 is also as described previously. 

Next explained is the process of canonical data so 
preparation program 91 stored in the program area 2a 
of biochemical information recording medium 2. This 
process is carried out by executing the canonical data 
preparation program 91 read out by the medium drive 
device 3. This execution first starts the main routine 91 a ss 
of the canonical data preparation program 91. 

The details of the processes of main routine 91a, 
constituent atom classification routine 91b. canonical 



number assignment routine 91c, and canonical data 
preparation routine 91 d after that are also as described 
previously (Fig. 20 to Fig. 37), and the canonical data for 
uniquely specifying a compound can be attained in a 
short time. 

The foregoing described the prefened embodiment 
of the biochemical information computer program prod- 
uct (recording medium) of the present inv^tion. but it is 
noted that the present invention is not limited to the 
atwve embodiment. 

For example, the canonical data preparation pro- 
gram 91 according fo the present invention does not 
have to be present together with the biochemical infor- 
mation processing program 22 according to the present 
invention in a single medium, but the canonical data 
preparation program 91 and biochemical information 
processing program 22 according to the present inven- 
tion may be recorded respectively in separate media, as 
shown in Rg. 47 and Fig. 48. 

Namely, as shown in Fig. 48, the canonical data 
preparation program 91 according to the present inven- 
tion may be singly formed as a storage medium 2 for 
preparation of canonical data. In this case, the storage 
medium 2 for preparation of canonical data can be uti- 
lized by the irtformation processing apparatus 1 shown 
in Fig. 49. Namely, the information processing appara- 
tus 1 is provided with the medium drive device 3. and 
the storage medium 2 for preparation of canonical data 
can be loaded in this device 3. Then this loading ena- 
bles the medium drive device 3 to access the informa- 
tion stored in the storage medium 2 for preparation of 
canonical data. This enables the information processing 
apparatus 1 to carry out the canonical data preparation 
program 91 stored in the program area 2a. The storage 
medium 2 for preparation of canonical data applicable 
is, for example, a disk type storage medium such as a 
flexible disk or a CD-ROM, or a tape type storage 
medium such as a magnetic tape. 

The biochemical Information computer program 
product (recording medium) of the present invention 
does not have to comprise all of the reaction detection 
program 25. receptor information detection program 26, 
and reaction path detection program 27, but may be 
arranged, for example as shown in Fig. 50, to comprise 
the reaction scheme detection means (the reaction 
scheme detection program) 25 and the reaction path 
detection means (the reaction path detection program) 
27. or may be ananged to comprise only either one of 
them. In this case, the receptor Information file 36 is not 
necessary, and the main program 23 in this case is the 
same as that shown in Fig. 11 except that it excludes 
step S1 15 for calling the receptor information indication 
program, as shown in Fig. 43. 

Without having to be limited to the above embodi- 
ments, the present invention can have a variety of mod- 
ifications. For example, an amino acid sequence for 
defining the structure of enzyme, or a base sequence 
may be recorded in the column of reference data in the 
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enzyme information file 32. Similarly, an amino add 
sequence for defining the structure of receptor, or a 
base sequence may be recorded In the column of refer- 
ence data in the receptor information file 36. Recording 
these sequences in the reference data makes possible 
utilization in connection with genetic information. 

An anomaly in a function of a specific enzyme could 
cause a disease called as dysbolism. Thus, information 
about abnormal enzyme may be recorded in the column 
of reference data in the enzyme information file 32 to be 
used for search of dysbolism. 

Further, the compound information file 31, enzyme 
information file 32, and relation information file 33 may 
include a record of information of conversion of foreign 
material occurring when a living body is dosed with the 
foreign material (which is a material not existing in living 
bodies originally). 

Furthermore, the conpound information file 31, 
enzyme information file 32, and relation information file 
33 may include a record of information concerning pro- 
duction or conversion of substance by enzyme or micro- 
organism. 

Furthermore, many drugs and agricultural chemi- 
cals themselves are enzyme inhibitors, agonists (ago- 
nistic materials), or antagonists (antagonistic 
materials). Then information about structures of drugs 
and agricultural chemicals or related information may 
be recorded as bio-related substances in the compound 
information file 31. 

Yet further, information concerning safety, such as 
toxicity of chemical substance, may be recorded in the 
column of reference data in the compound information 
file 31 and may be used in connection with behavior of 
substance in a living body system. 

Yet further, information in the field of nutrition may 
be recorded in the column of reference data of com- 
pound information file 31 . 

Furthermore, the indication method of reaction path 
may be modified, for example, in such a manner that the 
overall reaction path diagram is preliminarily prepared 
to be indicated in arbitrary position and scale and a 
desired reaction path part can be indicated by scrolling 
the screen top to bottom or left to right. The search of 
compound may adopt search by partial structure (partial 
identify search), search based on similarity, or the like. 
Further, the search of reaction path may be directed to 
a specific compound group, for example, such as 
metabolism of steroid. 

The present processing apparatus or the present 
processing method may also be used as a compound 
database system, and each information of the com- 
pound database system may be recorded in the 
medium of the present invention. In this case, it is pos- 
sible to perform search based on compound data of val- 
ues of physical properties or the like. Based on the 
three-dimensional structure data of compound, a theo- 
retical chemistry calculation function, such as calcula- 
tion of molecular orbit or calculation of molecular force 



field, may be added to the present processing appara- 
tus or the present processing method. Using the 
present processing apparatus or the present processing 
method, one can also know a reaction path when a spe- 

5 dfic enzyme is inhibited or inactivated or when an 
enzyme is defective. 

Furthermore, the biochemical information recording 
medium of the present invention may include a record of 
information for knowing the reaction path when a spe- 

10 cific enzyme is inhibited or inactivated or when an 
enzyme is defective. 

Industrial Applicability 

15 As detailed above, the biochemical information 
processing apparatus and biochemical information 
processing method of the present invention can effi- 
ciently perform detection of reaction scheme, detection 
of receptor information, and detection of reaction path. 

20 Also, use of the biochemical information recording 
medium of the present invention enables to efficiently 
perform the detection of reaction scheme, detection of 
receptor information, and detection of reaction path. 
In the detection of reaction scheme, first, reference 

25 is made to the list stored in the compound information 
file to read out a compound number corresponding to 
canonical data. Then, based on this compound number, 
reference is made to the relation information file to read 
out an enzyme number of an enzyme with this com- 

30 pound being a substrate or a product. Further, based on 
this enzyme number, reference is made to the enzyme 
information file to read out information about this 
enzyme. Then a chemical reaction scheme involving 
this compound is obtained from the information about 

35 the compound and enzyme thus read out. 

In this way, by mutual reference to the compound 
information file, enzyme information file, and relation 
information file, various Information can be efficiently 
acquired for an enzyme with a compound being a sub- 

40 Strata or a product even in the case of the structure of 
the compound being used as a key. 

Particularly, since the relation information file stores 
the list showing the relationship between compounds 
and enzymes with the compounds being substrates or 

45 products, it is easy to search for the relationship among 
a compound being a substrate, a compound being a 
product, and an enzyme for changing the substrate to 
the product, whereby a chemical reaction scheme can 
be attained efficiently. 

50 In the detection of receptor information, first, refer- 
ence is made to the list stored in the compound informa- 
tion file to read out a compound number corresponding 
to canonical data. Next, based on this compound 
number, reference is made to the relation information 

55 file to read out a receptor number of a receptor with this 
compound being an agonist or an antagonist. Further, 
based on this receptor number, reference is made to the 
reference information file to read out the additional infor- 
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mation about this receptor. Then the additional informa- 
tion about the receptor thus read out Is indicated on the 
display means. 

In this way, by mutual reference to the compound 
information file, receptor information file, and relation s 
information file, various information can be acquired 
efficiently for a receptor with a compound being an ago- 
nist or an antagonist even in the case of the structure of 
the compound being used as a key. 

Particularly, since the relation information file stores io 
the list showing the relationship between compounds 
and receptors with the compounds being agonists or 
antagonists, it is easy to search for the relationship 
among a compound being an agonist, a compound 
being an antagonist, and a receptor, whereby various is 
Information about the receptor can be obtained effi- 
ciently 

Further, in the detection of reaction path, first, refer- 
ence is made to the list stored in the compound informa- 
tion file to read out a compound number corresponding 20 
to canonical data. Next, based on this conpound 
number, reference is made to the relation information 
file to read out each of an enzyme number of an enzyme 
with this compound being a substrate and an enzyme 
number of an enzyme with this compound being a prod- 25 
uct. Further, based on these enzyme numbers, refer- 
ence is made to the enzyme information file to read out 
a conpound number of a compound being a substrate 
and a compound number of a compound being a prod- 
uct for every enzyme. Reading from the relation infer- 30 
mation file and the enzyme information file is repetitively 
candied out. Then, from a plurality of compound num- 
bers and a plurality of enzyme numbers thus read out, a 
reaction path of these compounds is obtained. 

In this way, by mutual reference to the compound 35 
infornnation file, enzyme information file, and relation 
information file, it is possible to efficiently search a reac- 
tion path involving a plurality of compounds. 

Particularly, since the relation information file stores 
the list showing the relationship between compounds 40 
and enzymes with the compounds being substrates or 
products, it is easy to search for the relationship among 
a compound being a substrate, a compound being a 
product, and an enzyme for changing the substrate to 
the product, whereby a reaction path involving a plural- 45 
ity of compounds can be obtained efficiently 

Further, employing the canonical data preparation 
means (the canonical data preparation program) 
according to the present invention, the characteristic 
data about each atom and the bonding pair data so 
between atoms, accepted through the input means, Is 
given to the canonical data preparation means. Then 
the canonical data preparation means prepares the 
canonical data based on these data within a short time. 
Also, by the canonical data preparation method accord- 55 
Ing to the present invention, the canonical data is pre- 
pared within a short time, based on the characteristic 
data about each of atoms constituting a compound and 



the bonding pair data between atoms. As described, the 
canonical data prepared by the canonical data prepara- 
tion means (the canonical data preparation program) 
and the canonical data pr^aration method according to 
the present invention is a very short string of character, 
numeral, and symbol, and the canonical data can be 
saved within a small storage area. Therefore, if the 
canonical data preparation means (the canonical data 
preparation program) and the canonical data prepara- 
tion method according to the present Invention are uti- 
lized in a compound/reaction database system, a use 
amount of storage area in the compound/reaction data- 
base system can be decreased remarkably. 

Claims 

1. A biochemical information processing apparatus 
conprising 

storage means for storing biochemical informa- 
tion about compounds and enzymes, 
input means for accepting input of Image data 
indicating said biochemical information or sym- 
bolic data indicating said biochemical informa- 
tion. 

reaction scheme detection means for, when 
said input means accepts data about a com- 
pound being a substrate and/or a product, 
detecting a chemical reaction scheme Involving 
said compound, based on the data, and 
display means for indicating at least a reaction 
scheme diagram of the chemical reaction 
scheme; 

wherein said storage means comprises 

a compound Information file storing a list 
showing the relation between compound 
numbers of compounds and canonical 
data con'esponding to said compounds, 
and additional information at>out said com- 
pounds, 

an enzyme information file storing a list 
showing the relation among enzyme num- 
bers of enzymes, compound numbers of 
compounds being substrates for said 
enzymes, and compound numbers of com- 
pounds being products by said enzymes, 
and additional Information about said 
enzymes, and 

a relation information file storing a list 
showing the relation among compound 
numbers of compounds as a key, enzyme 
numbers of enzymes with either said com- 
pound being a substrate, and enzyme 
numbers of enzymes with either said com- 
pound being a product; and 

wherein said reaction scheme 
detection means comprises 
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a first process portion for preparing 
from the data about a compound 
accepted through said irput means 
said canonical data uniquely indicating 
a chemical structure of said com- $ 
pound, further searching said com- 
pound information file, based on the 
canonical data, and thereby reading 
out a compound number correspond- 
ing to said canonical data when said w 
canonical data exists in said com- 
pound information file, 
a second process portion for reading 
an enzyme number of an enzyme with 
the compound being a substrate or a is 
product out of said relation information 
file, based on the compound number 
read out in said first process portion, 
a third process portion for reading a 
compound number of another com- so 
pound constituting a reaction system 
together with the enzyme of the 
enzyme number read out in said sec- 
ond process portion and said com- 
pound, and additional information 25 
about said enzyme out of said enzyme 
Information file, and 
a fourth process portion for Indicating 
a reaction scheme diagram of the 
compound accepted through said so 
input means on said display means 
from the compound number read out 
in said first process portion, the 
enzyme number read out in said sec- 
ond process portion, and the com- 35 
pound number of the another 
compound read out In said third proc- 
ess portion, and further indicating the 
additional information about the 
enzyme read out in said third process 40 
portion on said display means. 

3. 

2. The biochemical information processing apparatus 
according to Claim 1, said biochemical information 
processing apparatus further comprising receptor 45 
information detection means for, when said Input 
means accepts data about a compound, detecting 
additional information about a receptor with said 
compound being an agonist and/or an antagonist, 
based on the data; so 

wherein said storage means further stores 
biochemical infornnatlon about receptors, and 

further comprises a receptor information file 
storing a list showing the relation between ss 
receptor numbers of receptors and compound 
numbers of compounds being agonists and/or 
antagonists for said receptors, and additional 



information about said receptors: 

wherein said relation information 
file stores a list to show the relation among the 
compound numbers of the compounds as a 
key, the enzyme numbers of the enzymes with 
either said compound being a substrate, the 
enzyme numbers of the enzymes with either 
said compound being a product, the receptor 
numbers of the receptors with either said com- 
pound being an agonist, and the receptor num- 
bers of the receptors with either said 
compound being an antagonist; and 

wherein said receptor Information detec- 
tion means comprises 

a fifth process portion for preparing from 
data about a compound accepted through 
said input means said canonical data 
uniquely indicating a chemical structure of 
said compound, further searching said 
compound information file, based on said 
canonical data, and thereby reading out a 
compound number corresponding to said 
canonical data when said canonical data 
exists in said compound information file, 
a sixth process portion for reading, based 
on the compound number read out in said 
fifth process portion, a receptor number of 
a receptor with the compound being an 
agonist or an antagonist out of said relation 
Information file. 

a seventh process portion for reading at 
least additional information about the 
receptor of the receptor number read out In 
said sixth process portion out of said 
receptor information file, and 
an eighth process portion for indicating at 
least the additional information about the 
receptor read out In said seventh process 
portion on said display means. 

The biochemical Information processing apparatus 
according to Claim 1 or 2, said biochemical infor- 
mation processing apparatus further comprising 
reaction path detection means for, when said input 
means accepts data about a predetermined com- 
pound selected from a plurality of compounds con- 
stituting a reaction path, detecting the reaction path 
of said plurality of connpounds. based on the data; 

wherein said reaction path detection means 
comprises 

a ninth process portion for preparing from the 
data about the compound accepted through 
said Input means said canonical data uniquely 
Indicating a chemical structure of said com- 
pound, further searching said compound infor- 
mation file, based on the canonical data, and 
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thereby reading out a compound number cx>rre- 
spending to said canonical data wtien said 
canonical data exists in said compound infor- 
mation file, 

a tenth process portion for reading, based on 5 
the compound number read out in said ninth 
process portion, an enzyme number of an 
enzyme with the compound being a substrate 
and an enzyme number of an enzyme with the 
compound being a product out of said relation io 
information file, 

an eleventh process portion for reading, based 
on each enzyme number read out in said tenth 
process portion, a compound number of a 
compound being a substrate for said enzyme is 
and a compound number of a conpound being 
a product by said enzyme out of said enzyme 
information file, 

a twelfth process portion for repeating a proc- 
ess by said tenth process portion and a proc- 20 
ess by said eleventh process portion to obtain 
conpounds and enzymes within the predeter- 
mined reaction path, and 
a thirteenth process portion for indicating from 
enzyme numbers read out in said tenth proc* 25 
ess portion and compound numbers read out in 
said eleventh process portion a reaction 
scheme diagram of these compounds along 
the reaction path on said display means. 

30 

4. A biochemical information processing apparatus 
comprising 

storage means for storing biochemical informa- 
tion about compounds and enzymes, 35 
input means for accepting input of image data 
indicating said biochemical information or sym- 
bolic data indicating said biochemical informa- 
tion, 

reaction path detection means for, when said 40 
input means accepts data about a predeter- 
mined compound selected from a plurality of 
conpounds constituting a reaction path, 
detecting the reaction path of said plurality of 
compounds, based on the data, and 45 
display means for indicating at least a reaction 
scheme diagram of the chemical reaction 
scheme; 

wherein said storage means comprises 

50 

a conpound information file storing a list 
showing the relation between conpound 
numbers of compourtds and canonical 
data corresponding to said compounds, 
and additional information about said com- 55 
pounds, 

an enzyme information file storing a list 
showing the relation among enzyme num- 



bers of enzymes, compound numbers of 
conpounds being substrates for said 
enzymes, and compound numbers of com- 
pounds being products by said enzymes, 
and additional information about said 
enzymes, and 

a relation information file storing a list 
showing the relation among conpound 
numbers of compounds as a key, enzyme 
numbers of enzymes with either said com- 
pound being a substrate, and enzyme 
numbers of enzymes with either said com- 
pound being a product; and 

wherein said reaction path detec- 
tion means comprises 

a ninth process portion for preparing 
from the data about the compound 
accepted through said input means 
said canonical data uniquely indicating 
a chemical structure of said com- 
pound, further searching said com- 
pound information file. t>ased on the 
canonical data, and thereby reading 
out a compound number correspond- 
ing to said canonical data when said 
canonical data exists in said com- 
pound information file, 
a tenth process portion for reading, 
based on the compound number read 
out in said ninth process portion, an 
enzyme number of an enzyme with the 
compound being a substrate and an 
enzyme number of an enzyme with the 
compound being a product out of said 
relation information file, 
an eleventh process portion for read- 
ing, based on each enzyme number 
read out in said tenth process portion, 
a compound number of a compound 
being a substrate for said enzyme and 
a compound number of a compound 
being a product by said enzyme out of 
said enzyme information file, 
a twelfth process portion for repeating 
a process by said tenth process por- 
tion and a process by said eleventh 
process portion to obtain compounds 
and enzymes within the predeter- 
mined reaction path, and 
a thirteenth process portion for indicat- 
ing from enzyme numbers read out in 
said tenth process portion and com- 
pound numbers read out in said elev- 
enth process portion a reaction 
scheme diagram of these compounds 
along the reaction path on said display 
means. 
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The biochemical information processing apparatus 
according to Claim 4. said biochemical information 
processing apparatus further comprising receptor 
information detection means for, when said input 
means accepts data about a compound, detecting 5 
additional information about a receptor with said 
compound being an agonist and/or an antagonist, 
based on the data: 

wherein said storage means further stores 
biochemical information about receptors, and w 

further comprises a receptor information file 
storing a list showing the relation between 
receptor numbers of receptors and compound 
numbers of compounds being agonists and/or is 
antagonists for said reenters, and additional 
information about said receptors; 

wherein said relation information file 
stores a list to show the relation among the 
compound numbers of the compounds as a 20 
key, the enzyme numbers of the enzymes with 
either said compound being a substrate, the 
enzyme numbers of the enzymes with either 
said compound being a product, the receptor 
numbers of the receptors with either said com- 25 
pound being an agonist, and the receptor num- 
bers of the receptors with either said 
compound being an antagonist; and 

wherein said receptor information detec- 
tion means comprises 30 

a fifth process portion for preparing from 
data about a compound accepted through 7. 
said input means said canonical data 
uniquely indicating a chemical structure of 35 
said compound, further searching said 
compound information file, based on said 
canonical data, and thereby reading out a 
compound number corresponding to said 
canonical data when said canonical data 40 
exists in said compound information file, 
a sixth process portion for reading, based 
on the compound number read out in said 
fifth process portion, a receptor number of 
a receptor with the compound being an 45 
agonist or an antagonist out of said relation 
information file, 

a seventh process portion for reading at 
least additional information about the 
receptor of the receptor number read out in so 
said sixth process portion out of said 
receptor information file, and 
an eighth process portion for indicating at 
least the additional information about the 
receptor read out in said seventh process ss 
portion on said display means. 

The biochemical information processing apparatus 



according to any one of Claims 1 to 5. 

wherein said input means accepts input of 
characteristic data about each of atoms constituting 
a compound and bonding pair data between atoms; 

wherein said biochemical information 
processing apparatus further conrprises canonical 
data preparation means for preparing canonical 
data capable of uniquely specifying a chemical 
structure of said compound, based on each data 
accepted through said input means; and 

wherein said canonical data preparation 
means comprises 

a constituent atom classification process por- 
tion for classifying, based on each data 
accepted through said input means, the atoms 
into different classes each for equivalent atoms 
and assigning, to each atom, a different dass 
number for each class, 

a canonical number assignment process por- 
tion for assigning canonical numbers uniquely 
corresponding to the structure of said com- 
pound to the respective atoms, based on the 
class numbers assigned to the respective 
atonts in said constituent atom classification 
process portion, and 

a canonical data preparation process portion 
for preparing said canonical data, based on the 
canonical numbers assigned to the respective 
atoms in said canonical number assignment 
process portion. 

The biochemical information processing apparatus 
according to Claim 6, 

wherein said constituent atom classification 
process portion assigns three types of attributes (aj, 
bij, djj) to each atom and, utilizing the fact that 
atoms different in even only one of these attributes 
can be determined to be not equivalent, assigns a 
different class number for each equivalent atom to 
each atom, 

where among said three types of attributes 
(dj. bjj, djj), aj is a kind number of an atom of input 
number i. by is the number of bonds adjoining the 
atom of input number i and having a bond kind 
number being j, and djj is the number of routes that 
can be traced from the atom of input number i 
through j bonds in the shortest path; 

wherein said canonical number assignment 
process portion is an-anged so that when in a proc- 
ess for assigning a canonical number to each atom 
In the ascending order from 1 the canonical number 
1 is given to an atom with a highest priority of said 
dass number and thereafter canonical numbers up 
to the canonical number n are assigned in that 
manner, said canonical number assignment proc- 
ess portion selects an atom with a minimum canon- 
ical number out of atoms already having their 
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respective canonical numbers and bonding to an 
atom having no canonical number yet and then 
gives a canonical number n + 1 to an atom with a 
highest priority of said class number out of atoms 
bonding to said selected atom and having no 5 
canonical number yet; and 

wherein said canonical data preparation 
process portion gives three types of attributes (P|, 
Tj, Sj) to each atom and aligns these attributes in 
line to prepare said canonical data, 10 

where among said three types of attributes 
(Pj, Tj, Sj). Pj is a canonical number of an atom 
bonding to an atom of canonical number i and hav- 
ing a minimum canonical number, Tj is a symbol for 
a type of a bond between the atom of canonical 15 
number i and the atom of canonical number Pj, and 
Sj is a symbol for a kind of the atom of canonical 
number 1. 

8. A biochemical information processing method 20 
using an information processing apparatus com- 
prising 

storage means for storing biochemical informa- 
tion about compounds and enzymes. 2s 
input means for accepting input of image data 
indicating said biochemical information or sym- 
bolic data indicating said biochemical informa- 
tion, and 

display means for indicating at least a reaction 30 
scheme diagram of a chemical reaction 
scheme; 

wherein said storage means comprises 

a compound information file storing a list 3S 
showing the relation between compound 
numbers of compounds and canonical 
data corresponding to said compounds, 
and additional information about said com- 
pounds. 40 
an enzyme information file storing a list 
showing the relation among enzyme num- 
bers of enzymes, conrpound numbers of 
compounds being substrates for said 
enzymes, and compound numbers of com- 45 
pounds being products by said enzymes, 
and additional information about said 
enzymes, and 

a relation information file storing a list 
showing the relation among compound so 
numbers of compounds as a key, enzyme 
numbers of enzymes witii eitiier said com- 
pound being a substrate, and enzyme 
numbers of enzymes witii eitiier said com- 
pound being a product; and ss 

wherein said biochemical informa- 
tion processing metiiod comprises 



a first step for. when said input means 
accepts data about a compound being 
a substi-ate and/or a product, prepar- 
ing said canonical data uniquely indi- 
cating a chemical structure of said 
compound from the data, furtiier 
searching said compound information 
file, based on tiie canonical data, and 
thereby reading out a compound 
number corresponding to said canoni- 
cal data when said canonical data 
exists in said compound information 
file. 

a second step for reading an enzyme 
number of an enzyme with the com- 
pound being a substrate or a product 
out of said relation information file, 
based on tiie compound number read 
out in said first step, 
a third step for reading a compound 
number of another compound consti- 
tuting a reaction system together witii 
the enzyme of the enzyme number 
read out in said second step and said 
compound, and additional information 
about said enzyme out of said enzyme 
information file, and 
a fourth step for indicating a reaction 
scheme diagram of tiie compound 
accepted through said input means on 
said display means from tiie com- 
pound number read out in said first 
step, the enzyme number read out in 
said second step, and the compound 
number of tiie anotiier compound read 
out in said tiiird step, and further indi- 
cating the additional information about 
the enzyme read out in said third step 
on said display means. 

9. The biochemical information processing method 
according to Claim 8, 

wherein said storage means further stores 
biochemical information about receptors, and 

furtiier comprises a receptor information file 
storing a list showing tiie relation between 
receptor numbers of receptors and compound 
numbers of compounds being agonists and/or 
antagonists for said receptors, and additional 
information about said receptors; 

wherein said relation information file 
stores a list to show tiie relation among tiie 
compound numbers of tiie compounds as a 
key, the enzyme numbers of the enzymes witii 
either said compound being a substrate, tiie 
enzyme numbers of tiie enzymes with eitiier 
said compound being a product, the receptor 



37 



73 



EP 0 829 810 A1 



74 



numbers of the receptors with either said com- 
pound being an agonist, and the receptor num- 
bers of the receptors with either said 
compound being an antagonist: and 

wherein said biochemical information 5 
processing method further comprises 

a fifth step for, when said input means 
accepts data about a conrpound. prepar- 
ing said canonical data uniquely indicating 10 
a chemical structure of said compound 
from the data, further searching said com- 
pound information file, based on said 
canonical data, and thereby reading out a 
compound number corresponding to said is 
canonical data when said canonical data 
exists in said compound information file, 
a sixth step for reading, based on the com- 
pound number read out in said fifth step, a 
receptor number of a receptor with the 20 
compound being an agonist or an antago- 
nist out of said relation information file, 
a seventh step for reading at least addi- 
tional information about the receptor of the 
receptor number read out in said sixth step 25 
out of said receptor information file, and 
an eighth step for indicating at least the 
additional information about the receptor 
read out in said seventh step on said dis- 
play means. so 



10, The biochemical information processing method 
according to Claim 8 or 9. said biochemical infor- 
mation processing method further comprising 

35 

a ninth step Ibr, when said input means accepts 
data about a predetermined compound 
selected from a plurality of compounds consti- 
tuting a reaction path, preparing said canonical 
data uniquely indicating a chemical structure of 40 
said compound from the data, further search- 
ing said compound information file, based on 
the canonical data, and thereby reading out a 
conpound number corresponding to said 
canonical data when said canonical data exists 45 
in said compound information file, 
a tenth step for reading, based on the com- 
pound number read out in said ninth step, an 
enzyme number of an enzyme with the com- 
pound being a substrate and an enzyme so 
number of an enzyme with the compound 
being a product out of said relation information 
file, 

an eleventh step for reading, based on each 
enzyme number read out in said tenth step, a ss 
conrpound number of a compound being a sub- 
strate for said enzyme and a compound 
number of a compound being a product by said 



enzyme out of said enzyme information file, 
a twelfth step for repeating a process by said 
tenth step and a process by said eleventh step 
to obtain conrpounds and enzymes within the 
predetermined reaction path, and 
a thirteenth step for indicating from enzyme 
numbers read out in said tenth step and com- 
pound numbers read out in said eleventh step a 
reaction scheme diagram of these compounds 
along the reaction path on said display means. 

11. A biochemical information processing method 
using an information processing apparatus com- 
prising 

storage means for storing biochemical informa- 
tion about compounds artd enzymes, 
input means for accepting input of image data 
indicating said biochemical information or sym- 
bolic data indicating said biochemical informa- 
tion, and 

display means for indicating at least a reaction 
scheme diagram of a chemical reaction 
scheme; 

wherein said storage means comprises 

a compound information file storing a list 
showing the relation between compound 
numbers of conpounds and canonical 
data con'esponding to said conpounds, 
and additional information about said com- 
pounds, 

an enzyme information file storing a list 
showing the relation among enzyme num- 
bers of enzymes, conrpound numbers of 
conrpounds being substrates for said 
enzymes, and compound numbers of com- 
pounds being products by said enzymes, 
and additional information about said 
enzymes, and 

a relation information file storing a list 
showing the relation among compound 
numbers off compounds as a key, enzyme 
numbers of enzymes with either said com- 
pound being a substrate, and enzyme 
numbers of enzymes with either said com- 
pound being a product; and 

wherein said biochemical informa- 
tion processing method comprises 

a ninth step for, when said input 
means accepts data about a predeter- 
mined compound selected from a plu- 
rality of compounds constituting a 
reaction path, preparing said canoni- 
cal data uniquely indicating a chemical 
structure of said conpound from the 
data, further searching said com- 
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pound information file, based on the 
canonical data, and thereby reading 
out a compound number correspond- 
ing to said canonical data when said 
canonical data exists in said com- 5 
pound information file, 
a tenth step for reading, based on the 
compound number read out in said 
ninth step, an enzyme number of an 
enzyme with the compound being a 10 
substrate and an enzyme number of 
an enzyme with the compound being a 
product out of said relation information 
file, 

an eleventh step for reading, based on is 
each enzyme number read out in said 
tenth step, a compound number of a 
compound being a substrate for said 
enzyme and a compound number of a 
compound being a product by said 20 
enzyme out of said enzyme informa- 
tion file, 

a twelfth step for repeating a process 
by said tenth step and a process by 
said eleventh step to obtain com- 25 
pounds and enzymes within the pre- 
determined reaction path, and 
a thirteenth step for indicating from 
enzyme numbers read out in said 
tenth step and compound numbers 30 
read out in said eleventh step a reac- 
tion scheme diagram of these com- 
pounds along the reaction path on 
said display means. 

35 

12. The biochemical information processing method 
according to Claim 11, 

wherein said storage means further stores 
biochemical information about receptors, and 

40 

further comprises a receptor information file 
storing a list showing the relation between 
receptor numbers of receptors and compound 
numbers of compounds being agonists and/or 
antagonists for said receptors, and additional 45 
information about said receptors; 

wherein said relation information file 
stores a list to show the relation among the 
compound numbers of the compounds as a 
key. the enzyme numbers of the enzymes with so 
either said compound being a substrate, the 
enzyme numbers of the enzymes with either 
said compound being a product, the receptor 
numbers of the receptors with either said com- 
pound being an agonist, and the receptor num- ss 
bers of the receptors with either said 
compound being an antagonist; and 

wherein said biochemical information 



processing method further comprises 

a fifth step for, when said input means 
accepts data about a compound, prepar- 
ing said canonical data uniquely indicating 
a chemical structure of said compound 
from the data, further searching said com- 
pound information file, based on said 
canonical data, and thereby reading out a 
compound number corresponding to said 
canonical data when said canonical data 
exists in said compound information file, 
a sixth step for reading, based on the com- 
pound number read out in said fifth step, a 
receptor number of a receptor with the 
conrpound being an agonist or an antago- 
nist out of said relation information file, 
a seventh step for reading at least addi- 
tional information about the receptor of the 
receptor number read out in said sixth step 
out of said receptor information file, and 
an eighth step for indicating at least the 
additional information about the receptor 
read out in said seventh step on said dis- 
play means. 

13. The biochemical information processing method 
according to any one of Claims 8 to 12, 

wherein said input means accepts input of 
characteristic data about each of atonns constituting 
a compound and bonding pair data between atoms; 
and 

wherein said biochemical information 
processing method further connprises 

a constituent atom classification step for classi- 
fying, based on each data accepted through 
said input means, the atoms into different 
classes each for equivalent atoms and assign- 
ing, to each atom, a different class number for 
each class, 

a canonical number assignment step for 
assigning canonical numbers uniquely con^e- 
sponding to the structure of said compound to 
the respective atoms, based on the class num- 
bers assigned to the respective atoms in said 
constituent atom classification step, and 
a canonical data preparation step for preparing 
said canonical data enabling to uniquely spec- 
ify a chemical structure of said compound, 
based on the canonical numbers assigned to 
the respective atoms in said canonical number 
assignment step. 

14. The biochemical information processing method 
according to Claim 13, 

wherein said constituent atom classification 
step assigns three types of attributes (a;, by, dy) to 
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each atom and. utilizing the fact that atoms different 
in even only one of these attributes can be deter- 
mined to be not equivalent, assigns a different class 
number for each equivalent atom to each atom. 

where among said three types of attributes s 
(aj. bjj. djj), aj is a kind number of an atom of input 
number 1. by Is the number of bonds adjoining the 
atom of input number 1 and having a bond kind 
number being j, and dy is the number of routes that 
can be traced from the atom of input number i /o 
through j bonds In the shortest path; 

wherein said canonical number assignment 
step Is arranged so that when in a process for 
assigning a canonical number to each atom in the 
ascending order from 1 the canonical number 1 is is 
given to an atom with a highest priority of said class 
number and thereafter canonical numbers up to the 
canonical number n are assigned in that manner, 
said canonical number assignment step selects an 
atom with a minimum canonical number out of 20 
atoms already having their respective canonical 
numbers and bonding to an atom having no canon- 
ical number yet and then gives a canonical number 
n + 1 to an atom with a highest priority of said class 
number out of atoms bonding to said selected atom 2s 
and having no canonical number yet; and 

wherein said canonical data preparation 
step gives three types of attributes (Pj. Tj. Sj) to 
each atom and aligns these attributes in line to pre- 
pare said canonical data. 

where among said three types of attributes 
(Pj. Tj, Si). Pj is a canonical number of an atom 
bonding to an atom of canonical number i and hav- 
ing a minimum canonical number. Tj is a symbol for 
a type of a bond between the atom of canonical 
number i and the atom of canonical number Pj, and 
Sj is a symbol for a kind of the atom of canonical 
number i. 

15. A biochemical information computer program prod- 
uct used with an information processing apparatus 
comprising input means for accepting input of 
image data indicating biochemical Information or 
symbolic data indicating biochemical information, 
display means for indicating at least a reaction 
scheme diagram of a chemical reaction scheme, 
and reading means for reading information out of a 
computer-usable medium; 

said computer program product comprising the 
computer-usable medium having a file area for 
recording a file and a program area for record- 
ing a program and having computer-readable 
file and program embodied in said medium, for 
letting at least a reaction scheme diagram effi- 
ciently be searched for and be indicated by said 
display means, based on data input through 
said input means; 



said computer program product having, 
in said file area, 

a conrputer-readable compound information 
file for storing a list showing the relation 
between compound numbers of compounds 
and canonical data corresponding to said com- 
pounds, and additional information about said 
compounds. 

a computer-readable enzyme information file 
for storing a list showing the relation among 
enzyme numbers of enzymes, compound num- 
bers of compounds being substrates for said 
enzymes, and compound numbers of com- 
pounds being products by said enzymes, and 
additional information about said enzymes, and 
a computer-readable relation information file 
for storing a list showing the relation among 
compound numbers of compounds as a key, 
enzyme numbers of enzymes with either said 
compound being a substrate, and enzyme 
numbers of enzymes with either said com- 
pound being a product, and 
having, in said program area, 
a computer-readable reaction scheme detec- 
tion program for. when said input means 
accepts data about a compound being a sub- 
strate and/or a product, detecting a chemical 
reaction scheme involving said compound, 
based on the data; 

wherein said reaction scheme detection 



program comprises 

a first computer-readable process routine 
for preparing from the data about a com- 
35 pound accepted through said input means 

said canonical data uniquely indicating a 
chemical structure of said conpound. fur- 
ther searching said compound information 
file, based on the canonical data, and 
40 thereby reading out a compound number 

corresponding to said canonical data when 
said canonical data exists in said com- 
pound Information file, 
a second conrputer-readable process rou- 
45 tine for reading an enzyme number of an 

enzyme with the compound being a sub- 
strate or a product out of said relation infor- 
mation file, based on tiie compound 
number read out in said first process rou- 
50 tine. 

a third computer-readable process routine 
for reading a compound number of another 
compound constituting a reaction system 
together witii tiie enzyme of the enzyme 
55 number read out in said second process 

routine and said compound, and additional 
information about said enzyme out of said 
enzyme information file, and 
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a fourth computer-readable process rou- 
tine for indicating a reaction scheme dia- 
gram of the compound accepted through 
said input means on said display means 
from the compound number read out in 5 
said first process routine, the enzyme 
number read out in said second process 
routine, and the compound number of the 
another conpound read out in said third 
process routine, and further indicating the ro 
additional information about the enzyme 
read out in said third process routine on 
said display means. 

16. The biochemical information computer program is 
product according to Claim 15. 

said computer program product further having, 
in said file area, 

a computer-readable receptor Information file 20 
storing a list showing the relation between 
receptor numbers of receptors and conpound 
numbers of compounds being agonists and/or 
antagonists for said receptors, and additional 
information about said receptors; 25 

wherein said relation information file 
stores a list to show the relation among the 
compound numbers of the compounds as a 
key, the enzyme numbers of the enzymes with 
either said compound being a substrate, the 30 
enzyme numbers of the enzymes with either 
said compound being a product, the receptor 
numbers of the receptors with either said com- 
pound being an agonist, and the receptor num- 
bers of the receptors with either said 35 
compound being an antagonist; and 
said computer program product further having, 
in said program area, 

a computer-readable receptor information 
detection program for. when said input means 40 
accepts data about a compound, detecting 
additional information about a receptor with 
said conpound being an agonist and/or an 
antagonist, based on the data; 

wherein said receptor information detec- 45 
tion program comprises 

a fifth conputer-readable process routine 
for preparing from data about a conpound 
accepted through said input means said so 
canonical data uniquely indicating a chem- 
ical structure of said compound, further 
searching said compound information file, 
based on said canonical data, and thereby 
reading out a conpound number corre- ss 
sponding to said canonical data when said 
canonical data exists in said conpound 
information file, 



a sixth computer-readable process routine 
for reading, based on the compound 
number read out in said fifth process rou- 
tine, a receptor number of a receptor with 
the compound being an agonist or an 
antagonist out of said relation information 
file, 

a seventh computer-readable process rou- 
tine for reading at least additional informa- 
tion about the receptor of the receptor 
number read out in said sixth process rou- 
tine out of said receptor information file, 
and 

an eighth computer-readable process rou- 
tine for indicating at least the additional 
information about the receptor read out in 
said seventh process routine on said dis- 
play means. 

The biochemical information computer program 
product according to Claim 15 or 16, said conputer 
program product further having, in said program 
area, 

a computer-readable reaction path detection 
program for, when said input means accepts 
data about a predetermined compound 
selected from a plurality of conpounds consti- 
tuting a reaction path, detecting the reaction 
path of said plurality of conpounds. based on 
the data; 

wherein said reaction path detection 
program comprises 

a ninth computer-readable process routine 
for preparing from the data about the com- 
pound accepted through said input means 
said canonical data uniquely indicating a 
chemical structure of said conpound. fur- 
ther searching said compound information 
file, based on the canonical data, and 
thereby reading out a conpound number 
con'esponding to said canonical data when 
said canonical data exists in said com- 
pound information file, 
a tenth computer-readable process routine 
for reading, based on the compound 
number read out in said ninth process rou- 
tine, an enzyme number of an enzyme with 
the conpound being a substrate and an 
enzyme number of an enzyme with the 
compound being a product out of said rela- 
tion information file, 

an eleventh computer-readable process 
routine for reading, based on each enzyme 
nunnber read out in said tenth process rou- 
tine, a conpound number of a conpound 
being a substrate for said enzyme and a 
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compound number of a compound being a 
product by said enzyme out of said 
enzyme information file, 
a twelfth computer-readable process rou- 
tine for repeating a process by said tenth s 
process routine and a process by said 
eleventh process routine to obtain com- 
pounds and enzymes within the predeter- 
mined reaction path, and 
a thirteenth computer-readable process io 
routine for indicating from enzyme num- 
bers read out in said tenth process routine 
and compound numbers read out in said 
eleventh process routine a reaction 
scheme diagram of these compounds 75 
along the reaction path on said display 
means. 

18. A biochemical information computer program prod- 
uct used with an information processing apparatus 20 
comprising input means for accepting input of 
image data indicating biochemical information or 
symbolic data indicating biochemical information, 
display means for indicating at least a reaction 
scheme diagram of a chemical reaction scheme, 2s 
and reading means for reading information out of a 
computer-usable medium; 

said computer program product comprising the 
conputer-usable medium having a file area for 30 
recording a file and a program area for record- 
ing a program and having computer-readable 
file and program embodied in said medium, for 
letting at least a reaction scheme diagram effi- 
ciently be searched for and be indicated by said 35 
display means, based on data input through 
said Input means; 

said computer program product having, 
in said file area, 

a computer-readable compound information 40 
file tor storing a list showing the relation 
between compound numbers of compounds 
and canonical data corresponding to said com- 
pounds, and additional information about said 
compounds, 45 
a computer-readable enzyme information file 
for storing a list showing the relation among 
enzyme numbers of enzymes, compound num- 
bers of compounds being substrates for said 
enzymes, and compound numbers of com- so 
pounds being products by said enzymes, and 
additional information about said enzymes, and 
a computer-readable relation information file 
for storing a list showing the relation among 
compound numbers of compounds as a key, 55 
enzyme numbers of enzymes with either said 
compound being a substrate, and enzyme 
numbers of enzymes with either said com- 



pound being a product, and 
having, in said program area, 
a computer-readable reaction path detection 
program for, when said Input means accepts 
data about a predetermined compound 
selected from a plurality of compounds consti- 
tuting a reaction path, detecting the reaction 
path of said plurality of conrpounds, based on 
the data; 

wherein said reaction path detection 
program comprises 

a ninth computer-readable process routine 
for preparing from the data about the com- 
pound accepted through said input means 
said canonical data uniquely indicating a 
chemical structure of said compound, fur- 
ther searching said compound information 
file, based on the canonical data, and 
thereby reading out a compound number 
corresponding to said canonical data when 
said canonical data exists in said com- 
pound information file, 
a tenth computer-readable process routine 
for reading, based on the compound 
number read out in said ninth process rou- 
tine, an enzyme number of an enzyme with 
the compound being a substrate and an 
enzyme number of an enzyme with the 
compound being a product out of said rela- 
tion information file, 

an eleventh computer-readable process 
routine for reading, based on each enzyme 
number read out in said tenth process rou- 
tine, a compound number of a compound 
being a substrate for said enzyme and a 
compound number of a compound being a 
product by said enzyme out of said 
enzyme information file, 
a twelfth computer-readable process rou- 
tine for repeating a process by said tenth 
process routine and a process by said 
eleventh process routine to obtain com- 
pounds and enzymes within the predeter- 
mined reaction path, and 
a thirteenth computer-readable process 
routine for indicating from enzyme num- 
bers read out in said tenth process routine 
and compound numbers read out in said 
eleventh process routine a reaction 
scheme diagram of these compounds 
along the reaction path on said display 
means. 

19. The biochemical information computer program 
product according to Claim 18, 

said computer program product further having. 
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in said file area, 

a computer-readable receptor information file 
storing a list showing the relation between 
receptor numbers of receptors and compound 
numbers of compounds being agonists and/or 5 
antagonists for said receptors, and additional 
information about said receptors; 

wherein said relation information file 
stores a list to show the relation among the 
compound numbers of the compounds as a io 
key, the enzyme numbers of the enzymes with 
either said compound being a substrate, the 
enzyme numbers of the enzymes with either 
said compound being a product, the receptor 
numbers of the receptors with either said com- is 
pound being an agonist, and the receptor num- 
bers of the receptors with either said 
compound being an antagonist; and 
said computer program product further having, 
in said program area, 20 
a computer-readable receptor information 
detection program for, when said input means 
accepts data about a compound, detecting 
additional information about a receptor with 
said compound being an agonist and/or an 25 
antagonist, based on the data; 

wherein said receptor information detec- 
tion program comprises 

a fifth computer-readable process routine so 
for preparing from data about a compound 
accepted through said input means said 
canonical data uniquely indicating a chem- 
ical structure of said compound, further 
searching said compound information file. 35 
based on said canonical data, and thereby 
reading out a compound number corre- 
sponding to said canonical data when said 
canonical data exists in said compound 
information file, 40 
a sixth connputer-readable process routine 
for reading, based on the compound 
number read out in said fifth process rou- 
tine, a receptor number of a receptor with 
the compound being an agonist or an 45 
antagonist out of said relation information 
file. 

a seventh conrputer-readable process rou- 
tine for reading at least additional informa- 
tion about the receptor of the receptor so 
number read out in said sixth process rou- 
tine out of said receptor information file, 
and 

an eighth connputer-readable process rou- 
tine for indicating at least the additional ss 
information about the receptor read out in 
said seventh process routine on said dis- 
play means. 



20. TTie biochemical information computer program 
product according to any one of Claims 15 to 19, 

wherein said input means accepts input of 
characteristic data about each of atonts constituting 
a compound and bonding pair data between atoms; 

wherein said computer program product fur- 
ther has, in said program area, 

a computer-readable canonical data prepara- 
tion program fa preparing canonical data 
capable of uniquely specifying a chemical 
structure of said compound, based on each 
data accepted through said input means; and 
wherein said canonical data preparation 
program comprises 

a computer-readable constituent atom 
classification routine for classifying the 
atoms into different classes each for equiv- 
alent atoms and assigning, to each atom, a 
different class number for each class, 
a computer-readable canonical number 
assignment routine for assigning canonical 
numbers uniquely corresponding to the 
structure of said connpound to the respec- 
tive atoms, based on the class numbers 
assigned to the respective atoms in said 
constituent atom classification routine, and 
a computer-readable canonical data prep- 
aration routine for preparing said canonical 
data, based on the canonical numbers 
assigned to the respective atonrts in said 
canonical number assignment routine. 

21. TTie biochemical information computer program 
product according to Claim 20, 

wherein said constituent atom classification 
routine assigns three types of attributes (aj, by, dy) 
to each atom and. utilizing the fact that atoms differ- 
ent in even only one of these attributes can be 
determined to be not equivalent, assigns a different 
dass number for each equivalent atom to each 
atom, 

where among said three types of attributes 
(aj, bjj, djj), aj is a kind number of an atom of input 
number i, by is the number of bonds adjoining the 
atom of input number i and having a bond kind 
number being j, artd dy is the number of routes that 
can be traced from the atom of input number i 
through j bonds in the shortest path; 

wherein said canonical number assignment 
routine is arranged so that when in a process for 
assigning a canonical number to each atom in the 
ascending order from 1 the canonical number 1 is 
given to an atom with a highest priority of said dass 
number and thereafter canonical numbers up to the 
canonical number n are assigned in that manner, 
said canonical number assignment routine selects 
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an atom with a minimum canonical number out of 
atoms already having their respective canonical 
numbers and bonding to an atom having no canon- 
ical number yet and then gives a canonical number 
n + 1 to an atom with a highest priority of said class 5 
number out of atoms bonding to said selected atom 
and having no canonical number yet; and 

wherein said canonical data preparation rou- 
tine gives three types of attributes (Pj, Tj, Sj) to each 
atom and aligns these attributes in line to prepare w 
said canonical data, 

where among said three types of attributes 
(Pj. Tj, Sj), Pj is a canonical number of an atom 
bonding to an atom of canonical number i and hav- 
ing a minimum canonical number, Tj is a symbol for 75 
a type of a bond between the atom of canonical 
number i and the atom of canonical number Pj, and 
Sj is a symbol for a kind of the atom of canonical 
number i. 

20 

22. The biochemical information computer program 
product according to any one of Claims 15 to 21, 
wherein said computer-usable medium is a disk 
type recording medium or a tape type recording 
medium. 25 
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